Texture information giving method, object extracting method, three-dimensional model generating method and apparatus for the same

ABSTRACT

The present method represents a three-dimensional shape model by polygons according to a plurality of object images information picked up by rotating a real object for every arbitrary angle to assign texture information on each polygon from object image information having the largest projection area of the relevant polygon. In order to improve the color continuity between adjacent polygons, the object image information having correspondence between a polygon of interest and an adjacent polygon thereof is selected so as to be the object image information approximating the shooting position and the shooting direction. An alternative method divides an object image into a plurality of regions, obtains difference between an object image and a background image in region level, outputs a mean value of the absolute value of difference in the region level, and detects the region having the mean value of absolute values of difference equal to or greater than a threshold value as the object portion. Another further method obtains a plurality of object images by shooting only a background of an object of interest and by shooting the object of interest during each rotation. A silhouette image is generated by carrying out a difference process between the object image and the background image. A voting process is carried out on the voxel space on the basis of the silhouette image. A polygon is generated according to the three-dimensional shape obtained by the voting process. The texture obtained from the object image is mapped to the polygon.

TECHNICAL FIELD

The present invention relates to a texture information assignment methodof assigning texture information to a shape model of a real object ofinterest according to an object image obtained by shooting that realobject of interest, an object extraction method of extracting an objectportion by removing an undesired portion such as the background from theobject image, a three-dimensional model generation method of generatinga three-dimensional model of an object of interest, and apparatus ofthese methods.

BACKGROUND ART

In accordance with the development of computer graphics and the like,there has been intensive efforts to provide a system for practical usagein three-dimensional graphics. However, one appreciable problem inaccordance with the spread of such a system of practical usage is themethod of obtaining shape data. More specifically, the task of enteringthe complicated three-dimensional shape of an object having a free-formsurface or that resides in the natural world into a computer isextremely tedious and difficult.

Furthermore, in reconstructing an object with a computer and the like,it is difficult to express the texture of the surface of the object in amore realistic manner by just simply reconstructing the shape of theobject.

Three-dimensional image information can be handled more easily if theshape information and color/texture information can be reconstructedwithin the computer based on image information that is obtained byshooting an actual object.

In three-dimensional image communication such as by, for example, theInternet, the opportunity of a general user to create athree-dimensional image who is the transmitter of information willincrease. Therefore, the need arises for a simple and compact apparatusthat produces a three-dimensional image.

(1) Japanese Patent Laying-Open No. 5-135155 discloses athree-dimensional model generation apparatus that can construct athree-dimensional model from a series of silhouette images of an objectof interest placed on a turntable under the condition of normalillumination.

According to this three-dimensional model construction apparatus, anobject of interest that is rotated on a turntable is continuously shotby a camera. The silhouette image of the object of interest is extractedfrom the obtained image by an image processing computer. By measuringthe horizontal distance from the contour of the silhouette image to thevertical axis of rotation for the silhouette image, a three-dimensionalmodel is generated according to this horizontal distance and the angleof rotation. More specifically, the contour of the object of interest isextracted from the continuously shot silhouette images to be displayedas a three-dimensional model.

FIG. 1 is a diagram representing the concept of assigning textureinformation to the three-dimensional model generated as described aboveaccording to the image information continuously picked up by a camera.

Japanese Patent Laying-Open No. 5-135155 discloses the case of obtainingimage information by continuously rotating an object of interest andshooting the same, i.e., obtaining image information in the resolutionlevel of shape recognition with respect to a three-dimensional model ofa human figure. More specifically, an image is picked up for every 1° ofrotation to obtain 360 images with respect to the object of interest.

For the sake of simplifying the description, the case of shooting animage for every larger stepped angle will be described hereinafter.However, the essence is identical.

Consider the case of picking up a total of n images by rotating anobject of interest for every predetermined angle of rotation, as shownin FIG. 1. In this case, each image information corresponds to the labelnumber of 1, 2, 3 . . . , n.

The object of interest is represented as a shape model (wire framemodel) 300 using a polygon (triangular patch). When texture informationis to be assigned to shape model 300, color information (textureinformation) of the image information of a corresponding label number isassigned for each triangular patch according to the direction of thecamera shooting the object of interest.

More specifically, based upon the vector towards the target triangularpatch from the axis of rotation of shape model 300, the textureinformation with respect to the triangular patch is captured from theimage that has the direction of the shooting direction vector and thisvector most closely matched. Alternatively, from the standpoint ofintuition, a plurality of lines such as the circles of longitude of aterrestrial globe can be assumed with respect to the surface of themodel. Texture information can be captured from the first imageinformation for the triangular patch in the range of 0° to 1×360/n°,from the second image information for the triangular patch in the rangeof 1×360/n° to 2×360/n°, and so on. This method of capturing textureinformation will be referred to as the central projection systemhereinafter.

The central projection system is advantageous in that image informationcan be provided in a one-to-one correspondence with respect to eachtriangular patch or the constituent element forming the shape model(referred to as “three-dimensional shape constituent element”hereinafter), and that this correspondence can be determined easily.

However, the central projection system is disadvantageous in that thejoint of the texture is noticeable when the gloss or the texture of thecolor information is slightly different due to the illumination and thelike since the texture information is assigned from different imageinformation (image information of a different label number) to athree-dimensional shape constituent element that is not present withinthe same range of rotation angle when viewed from the axis of rotation.

Furthermore, a corresponding three-dimensional shape constituent elementmay be occluded in the image information obtained from a certaindirection of pickup depending upon the shape of the object of interest.There is a case where no texture information corresponding to a certainthree-dimensional shape constituent element is included in thecorresponding image information.

FIG. 2 is a diagram for describing such a situation. In FIG. 2, therelationship is shown of the axis of rotation, the cross section of theobject of interest and the object image projected in the camera at avertical plane including the axis of rotation of the object of interest.When the object of interest takes a shape that has an occluded regionthat cannot be viewed from the camera as shown in FIG. 2, the imageinformation picked up from this angle direction is absent of the textureinformation corresponding to this occluded region. However, textureinformation of this occluded region can be captured from another pickupdirection that has a certain angle with respect to the previousdirection of pickup.

(2) As a conventional method, extraction of an object portion from animage of an object can be effected manually using an auxiliary tool.More specifically, the image of an object obtained by shooting thetarget object together with the background is divided into a pluralityof regions. The operator selects the background area in the image of theobject to erase the background area using a mouse or the like. However,this method is disadvantageous in that the burden on the operator forthe manual task is too heavy.

Another conventional method of object extraction employs the chroma-keytechnique. More specifically, the portion of the object is extractedfrom the image of the object using a backboard of the same color.However, this method is disadvantageous in that a special environment ofa backboard of the same color has to be prepared.

A further conventional method of object extraction employs the simpledifference method. More specifically, difference processing is effectedbetween an object image and a background image in which only thebackground of the object of interest is shot to obtain the difference.The area having an absolute value of the difference greater than thethreshold value is extracted as the portion of the object. However,there is a problem that, when the object of interest includes an area ofa color identical to the color of the background, that portion cannot beextracted as a portion of the object. In other words, this method isadvantageous in that the extraction accuracy of the object portion ispoor.

Another conventional method of object extraction takes advantage of thedepth information by the stereo method. More specifically, the area withthe depth information that is smaller than a threshold value isextracted as the portion of an object of interest from an image of theobject obtained by shooting the object together with the background.However, the difference in depth is so great in the proximity of theboundary between the object of interest and the background that properdepth information cannot be obtained reliably. There is a problem that aportion of the background is erroneously extracted as a portion of theobject.

All of the above-described conventional methods require thedetermination of a threshold value in advance. It is extremely difficultto determine an appropriate threshold value on account of the conversionproperty of the AID converter for converting the image and the propertyof the illumination. There is also the problem that the threshold valuemust be reselected when the conversion characteristic of the AIDconverter or the property of the illumination is changed.

(3) A three-dimensional digitizer is known as a conventional apparatusof reading out the shape of an object of interest. The three-dimensionaldigitizer includes an arm with a plurality of articulations and a pen.The operator provides control so as to bring the pen in contact with theobject of interest. The pen is moved along on the object of interest.The angle of the articulation of the arm varies as the pen is moved. Athree-dimensional shape of the object of interest is obtained accordingto the angle information of the articulation of the arm. However, such adigitizer is disadvantageous in that the time and the labor of the taskof measurement by manual means are too great and heavy.

The laser scanner is known as another conventional apparatus. The laserscanner directs a laser beam on an object of interest to scan theobject. As a result, a three-dimensional shape of the object of interestis obtained. There is a problem that a three-dimensional model of anobject of interest formed of a substance that absorbs light cannot beobtained with such a laser scanner. There is also the problem that theapparatus is extremely complex and costly. Furthermore, there is aproblem that the environment for pickup is limited since measurement ofthe object of interest must be carried out in a dark room. There is alsothe problem that color information cannot be easily input.

U.S. Pat. No. 4,982,438 discloses a three-dimensional model generationapparatus. This apparatus computes a hypothetical existing region usingthe silhouette image of an object of interest. This hypotheticalexisting region is a conical region with the projection center of thecamera as the vertex and the silhouette of an object of interest as thecross section. This conical region (hypothetical existing region ) isdescribed with a voxel model. This process is carried out for aplurality of silhouette images. Then, a common hypothetical existingregion is obtained to generate a three-dimensional model of the objectof interest. Here, the common hypothetical existing region is the ANDedarea of a plurality of hypothetical existing regions with respect to theplurality of silhouette images. However, there is a problem that athree-dimensional model of high accuracy cannot be generated when thereis one inaccurate silhouette image since the three-dimensional shape isobtained by the AND operation. There is also a problem that colorinformation is insufficient or a local concave area cannot be recognizedsince the object of interest is shot only from a horizontal direction(direction perpendicular to the axis of rotation).

In the above three-dimensional model generation apparatus of JapanesePatent Laying-Open No. 5-135155, an object of interest that is rotatingon a turntable is shot by a camera to obtain a plurality of silhouetteimages. A plurality of shapes of the object of interest at a pluralityof horizontal planes (a plane perpendicular to the axis of rotation) areobtained on the basis of these plurality of silhouette images. Thepoints on the contour line of the shape of the object of interest inadjacent horizontal planes are connected as a triangular patch. Thepoint on the contour line of the shape of the object of interest in onehorizontal plane is determined for every predetermined angle. Athree-dimensional model of an object of interest is generated in thisway. However, there is a problem in this apparatus that a specialenvironment for shooting is required since a backboard to generate asilhouette image is used. Furthermore, the amount of data is great sincethe three-dimensional model is generated using the shape of the objectof interest in a plurality of horizontal planes. There was a problemthat the process is time consuming.

In view of the foregoing, an object of the present invention is toprovide a method and apparatus of texture information assignment thatallows assignment of texture information to each three-dimensional shapeconstituent element forming a shape model regardless of the shape of theobject of interest in the event of reconstructing a three-dimensionalmodel within a computer and the like according to image informationobtained by shooting a real object.

Another object of the present invention is to provide a method andapparatus of texture information assignment that allows assignment oftexture information approximating the texture of a real object fromimage information obtained by shooting a real object in the assignmentof texture information to a shape model according to picked up imageinformation.

A further object of the present invention is to provide a method andapparatus of texture information assignment with less noticeablediscontinuity (seam) in texture assigned to each three-dimensional shapeconstituent element constructing a shape model in assigning textureinformation to the shape model according to image information obtainedby shooting a real object.

Still another object of the present invention is to provide a method andapparatus of object extraction that allows a portion, if present, of anobject of image having a color identical to that of the backgroundextracted.

A still further object of the present invention is to provide a methodand apparatus of object extraction that can extract always stably andproperly a portion of an object even when various characteristicschange.

Yet a further object of the present invention is to provide a method andapparatus of object extraction that can have manual task reduced, anddispensable of a special shooting environment.

Yet another object of the present invention is to provide a method andapparatus of three-dimensional model generation that can have manualtask reduced.

Yet a still further object of the present invention is to provide amethod and apparatus of three-dimensional model generation of a simplestructure with few limitation in the shooting environment and substanceof the object of interest.

An additional object of the present invention is to provide a method andapparatus of three-dimensional model generation that can generate athree-dimensional model in high accuracy even if there are severalinaccurate ones in a plurality of silhouette images.

Still a further object of the present invention is to provide a methodand apparatus of three-dimensional model generation in which sufficientcolor information can be obtained and that allows recognition of a localconcave portion in an object of interest.

Yet a still further object of the present invention is to provide amethod and apparatus of three-dimensional model generation that cangenerate a three-dimensional model at high speed with fewer data to beprocessed, dispensable of a special shooting environment.

Disclosure of the Invention

According to an aspect of the present invention, a texture informationassignment apparatus for a shape model includes: means for describingthe shape of an object of interest as a shape model by a set of aplurality of three-dimensional shape constituent elements; and means forassigning texture information with respect to a shape model according tothe amount of texture information for a three-dimensional shapeconstituent element of each object image information perthree-dimensional shape constituent element on the basis of a pluralityof object images information captured by shooting an object of interestfrom different view points.

Preferably, the texture information amount is represented by thematching degree between the direction of the surface normal of eachthree-dimensional shape constituent element and the shooting directionof each object image information per three-dimensional shape constituentelement.

Preferably, the texture information amount is represented by the area ofthe three-dimensional shape constituent element that is projected oneach object image information per three-dimensional shape constituentelement.

According to another aspect of the present invention, a textureinformation assignment apparatus for a shape model includes: means fordescribing the shape of an object of interest as a shape model by a setof a plurality of three-dimensional shape constituent elements; andmeans for assigning per three-dimensional shape constituent element thetexture information for a shape model according to both the textureinformation amount for the three-dimensional shape constituent elementof each object image information and the texture continuity betweenthree-dimensional shape constituent elements on the basis of a pluralityof object images information captured by shooting the object of interestfrom different viewpoints.

Preferably, the texture information assignment means assigns the textureinformation for a shape model from the object image information providedin correspondence with each three-dimensional shape constituent elementso as to set minimum an evaluation function that decreases in accordancewith increase of the texture information amount and that decreases inaccordance with improvement in texture continuity betweenthree-dimensional shape constituent elements.

In the above evaluation function, the texture continuity is representedas a function of difference in the shooting position and the shootingdirection of respective corresponding object image information between athree-dimensional shape constituent element of interest and an adjacentthree-dimensional shape constituent element.

Preferably in the above evaluation function, the texture continuity isrepresented as a function that increases in accordance with a greaterdifference between the label number assigned to a three-dimensionalshape constituent element of interest and the label number assigned to athree-dimensional shape constituent element that is adjacent to thethree-dimensional shape constituent element of interest when objectimage information is picked up according to change in position and alabel number is applied to each object image information correspondingto the change in position.

Preferably in the above evaluation function, the texture continuity isrepresented as a function that increases in accordance with a greaterdifference between the label number assigned to a three-dimensionalshape constituent element of interest and the label number assigned to athree-dimensional shape constituent element adjacent to thethree-dimensional shape constituent element of interest when objectimage information is picked up according to a regular change in positionand a label number is applied to each object image informationcorresponding to the change in position.

Preferably in the above evaluation function, the texture informationamount is represented as a function of an area of a three-dimensionalshape constituent element projected on each object image information perthree-dimensional shape constituent element.

Preferably in the above evaluation function, the texture informationamount is represented as a function of a level of match between thedirection of the surface normal of each three-dimensional shapeconstituent element and the shooting direction of each three-dimensionalshape constituent element per three-dimensional shape constituentelement.

Preferably, the above evaluation function is represented as a linearcombination of the total sum of the difference between the label numberassigned to the i-th (i: natural number) three-dimensional shapeconstituent element and the label number assigned to thethree-dimensional shape constituent element adjacent to the i-ththree-dimensional shape constituent element for all three-dimensionalshape constituent elements, and the total sum of the area of the i-ththree-dimensional shape constituent element projected on the objectimage information corresponding to the label number assigned to the i-ththree-dimensional shape constituent element for all three-dimensionalshape constituent elements.

According to a further aspect of the present invention, a textureinformation assignment apparatus for a shape model includes: means fordescribing the shape of an object of interest as a shape model by a setof a plurality of three-dimensional shape constituent elements; meansfor providing correspondence between a label number and everythree-dimensional shape constituent element so as to set minimum anevaluation function that decreases in accordance with increase of atexture information amount for each three-dimensional shape constituentelement and that decreases in accordance with improvement of texturecontinuity in the texture information assigned to each three-dimensionalshape constituent element and an adjacent three-dimensional shapeconstituent element when a plurality of object images information arepicked up in accordance with change in position and a label number isapplied to each object image information corresponding to change inposition; and means for assigning texture information to athree-dimensional shape constituent element by carrying out a weightedmean process according to the area of a three-dimensional shapeconstituent element projected on each object image information on thebasis of object image information corresponding to the related labelnumber and the object image information corresponding to a predeterminednumber of label numbers including that related label number.

Preferably, the means for assigning texture information to thethree-dimensional shape constituent element obtains the area projectedon the object image information corresponding to the label numberrelated to the three-dimensional shape constituent element and theobject image information corresponding to the predetermined number oflabel numbers including the related label number for thethree-dimensional shape constituent element, and uses this as theweighting factor in carrying out a weighted mean process. For thetexture information of the three-dimensional shape constituent element,the portion of the three-dimensional shape constituent element projectedon the object image information is obtained. The image information(color, density or luminance) of this projected portion is subjected toa weighted mean process to result in the texture information.

According to still another aspect of the present invention, a textureinformation assignment apparatus for a shape model includes: means fordescribing the shape of an object of interest as a shape model by a setof a plurality of three-dimensional shape constituent elements; meansfor providing correspondence between a label number and everythree-dimensional shape constituent element so as to set minimum anevaluation function that decreases in accordance with increase oftexture information amount for each three-dimensional shape constituentelement and that decreases in accordance with improvement in texturecontinuity of texture information respectively assigned to eachthree-dimensional shape constituent element and an adjacentthree-dimensional shape constituent element when a plurality of objectimage information are picked up according to regular change in positionand a label number is applied to each object image informationcorresponding to change in position; and means for assigning textureinformation to a three-dimensional shape constituent element by carryingout a weighted means process according to an area of a three-dimensionalshape constituent element projected on each object image information onthe basis of the object image information corresponding to a relatedlabel number and the object image information corresponding to apredetermined number of label numbers including that related labelnumber.

Preferably, the means for assigning texture information to athree-dimensional shape constituent element obtains the area projectedon the object image information corresponding to the label numberrelated to a three-dimensional shape constituent element and the objectimage information corresponding to the predetermined number of labelnumbers including the related label number for the three-dimensionalshape constituent element, and uses this as the weighting factor for aweighted mean process. For the texture information of athree-dimensional shape constituent element, the portion where thethree-dimensional shape constituent element is projected on the objectimage information is obtained. The image information (color, density orluminance) of this projected portion is subjected to a weighted meanprocess to result in the texture information.

According to a still further aspect of the present invention, a textureinformation assignment apparatus for a shape model includes: means forcapturing a plurality of object images information by shooting an objectof interest from different viewpoints; means for describing the shape ofthe object of interest as a shape model by a set of a plurality ofthree-dimensional shape constituent elements; and means for assigningtexture information obtained by carrying out a weighted mean process forall the object image information according to the area corresponding tothe three-dimensional shape constituent element projected on theplurality of object images information for every three-dimensional shapeconstituent element.

Preferably, the means for assigning texture information to thethree-dimensional shape constituent element obtains the area projectedon the object image information for each three-dimensional shapeconstituent element, and uses the obtained area as the weighting factorin carrying out the weighted mean process. For the texture informationof the three-dimensional shape constituent element, the portion of thethree-dimensional shape constituent element projected on the objectimage information is obtained. The image information (color, density orluminance) of this projected portion is subjected to a weighted meansprocess to result in the texture information.

According to the texture information assignment apparatus, the mostappropriate texture information of the actual object can be selectivelyassigned to the shape model, out from the plurality of image informationobtained by shooting an object of interest when the shape model isreconstructed within a computer on the basis of image informationobtained by shooting an actual object.

When texture information (color information) is to be assigned to theshape model represented as a set of a plurality of three-dimensionalshape constituent elements, the texture information most approximatingthe texture information of the actual object can be selectively assignedto each three-dimensional shape constituent element while suppressingdiscontinuity in texture information between respectivethree-dimensional shape constituent elements.

Since the process of assigning texture information can be carried out bysubstitution with the labeling issue for each three-dimensional shapeconstituent element on the basis of the object image informationobtained by shooting an actual object of interest, the process ofapplying the texture information to each three-dimensional shapeconstituent element can be carried out in a procedure suitable forcomputer processing and the like.

According to yet a further aspect of the present invention, an objectextraction apparatus of extracting a portion of an object with anunwanted area removed from an object image obtained by shooting anobject of interest includes: region segmentation means and extractionmeans. The region segmentation means divides the object image into aplurality of regions. The extraction means identifies and extracts anobject portion in the object image by subjecting the information of eachpixel in the object image to a process of consolidation for everyregion. Here, an unwanted portion is, for example, the background area.

Preferably in the extraction means, the process of consolidating theinformation of each pixel in the object image for every region is toaverage the information of each pixel in the object image for everyregion.

Preferably, the extraction means identifies and extracts the objectportion in the object image by carrying out a thresholding process onthe information of each pixel consolidated for every region.

Preferably, the information of each pixel in the object image is thedifference information obtained by carrying out a difference processbetween a background image obtained by shooting only the background ofthe object of interest and an object image.

Preferably, the extraction means includes difference processing means,mean value output means, and threshold value processing means. Thedifference processing means carries out a difference process between thebackground image obtained by shooting only the background of the objectof interest and the object image. The mean value output means obtainsthe mean value in each region for the absolute value of the differenceobtained by the difference process. The threshold value processing meanscompares the mean value in a region with a predetermined value toextract the region where the mean value is equal to or greater than apredetermined value as the object portion.

Preferably, the extraction means comprises mean value output means,difference processing means, and threshold value processing means. Themean value output means computes the mean value of the pixel in eachregion of the object image. The difference processing means carries outa difference process between the mean value of the pixels in each regionof the object image and the mean value of the pixels in a correspondingregion of the background image. The threshold processing means comparesthe absolute value of the difference obtained by the difference processwith a predetermined value to extract the region where the absolutevalue of the difference is greater than the predetermined value as theobject portion.

Preferably, the information of each pixel of the object image is thedepth information.

According to yet another aspect of the present invention, the objectextraction apparatus of extracting an object portion with an unwantedarea removed from the object image obtained by shooting the object ofinterest includes: depth information computation means, regionsegmentation means, mean value computation means, and extract means. Thedepth information computation means computes the depth information ofthe object image. The region segmentation means divides the object imageinto a plurality of regions. The mean value computation means computesthe mean value of the depth information for each region. The extractmeans extracts as an object portion a region out of the plurality ofregions that has a mean value within a predetermined range, i.e. aregion having a mean value smaller than a predetermined value,particularly when an object located forward than the object of interestis not included in the object image.

According to yet a still further aspect of the present invention, anobject extraction apparatus of extracting a portion of an object with anunwanted portion removed from the object image on the basis of an objectimage obtained by shooting an object of interest and a plurality ofbackground images obtained by shooting only the background of the objectof interest a plurality of times includes difference means, extractionmeans, and threshold value determination means. The difference meanscomputes the difference between the object image and the backgroundimage. The extraction means extracts a portion of the object imagehaving a difference greater than the threshold value as the objectportion. The threshold value determination means determines thethreshold value in a statistical manner on the basis of distribution ofthe plurality of background images.

According to an additional aspect of the present invention, an objectextraction apparatus of extracting a portion of an object with anunwanted portion removed from an object image on the basis of an objectimage obtained by shooting an object of interest and a plurality ofbackground images obtained by shooting only the background of the objectof interest a plurality of times includes computation means, differencemeans, and extraction means. The computation means computes for everypixel the mean value and the standard deviation of the pixels located atthe same coordinates in the plurality of background images. Thedifference means computes the difference between the value of each pixelin the object image and the mean value of the pixels in the backgroundimages corresponding to that pixel. The extraction means extracts thepixel from the object image having a difference that is greater than apredetermined times the standard deviation as the object portion.

According to yet a further aspect of the present invention, an objectextraction apparatus of extracting a portion of an object with anunwanted portion removed from an object image on the basis of an objectimage obtained by shooting an object of interest and a plurality ofbackground images obtained by shooting only the background of the objectof interest a plurality of times includes average/standard deviationcomputation means, region segmentation means, difference means, averagedifference computation means, average standard deviation computationmeans, and extract means. The average/standard deviation computationmeans computes for every pixel the mean value and the standard deviationof pixels located at the same coordinates in a plurality of backgroundimages. The region segmentation means divides the object image into aplurality of regions. The difference means computes the differencebetween the value of each pixel in each region of the object image andthe mean value of the corresponding pixels in the region of thebackground images corresponding to that region. The average differencecomputation means computes the average in difference for every eachregion. The average standard deviation computation means computes themean value of the standard deviation for every region. The extract meansextracts the region out of the plurality of regions having the meanvalue of the difference greater than a predetermined times the meanvalue of the standard deviation.

According to still another aspect of the present invention, an objectextraction apparatus of extracting a portion of an image with anunwanted portion removed from an object image on the basis of an objectimage obtained by shooting an object of interest and a plurality ofbackground images obtained by shooting only the background of the objectof interest a plurality of times includes average/standard deviationcomputation means, region segmentation means, average computation means,difference means, average difference computation means, average standarddeviation computation means and extract means. The average/standarddeviation computation means computes for each pixel the mean value andthe standard deviation of pixels located at the same coordinates in theplurality of background images. The region segmentation means dividesthe object image into a plurality of regions. The average computationmeans computes the mean value of a pixel in each region. The differencemeans computes the absolute value of difference between the mean valueof pixels in each region of the object image and the mean value of thepixels in the region of the background images corresponding to thatregion. The average difference computation means computes the mean valueof the absolute values of the difference for each region. The averagestandard deviation computation means computes the mean value of thestandard deviation for each region. The extract means extracts a regionout of the plurality of regions having a mean value of absolute valuesof difference greater than a predetermined times the mean value of thestandard deviation.

According to yet another aspect of the present invention, an objectextraction apparatus of extracting a portion of an object with anunwanted portion removed from an object of image on the basis of anobject image obtained by shooting an object of interest and a pluralityof background images obtained by shooting only the background of theobject of image for a plurality of times includes average/standarddeviation computation means, region segmentation means, averagecomputation means, difference means, average standard deviationcomputation means, and extract means. The average/standard deviationcomputation means computes for each pixel the mean value and thestandard deviation of pixels located at the same coordinates in theplurality of background images. The region segmentation means dividesthe object image into a plurality of regions. The average computationmeans computes the mean value of the pixels in each region of the objectimage, and also the mean values in each region of the mean value of thepixels in the background images. The difference means computes theabsolute value of the difference between the mean value of the pixels ineach region of the object image and the mean value in each region of themean values of the pixels in the region of the background imagescorresponding to that region. The average standard deviation computationmeans computes the mean value of the standard deviation for each region.The extract means extracts a region out of the plurality of regionshaving an absolute value of difference greater than a predeterminedtimes the mean value of the standard deviation as an object portion.

According to still another aspect of the present invention, an objectextraction apparatus of extracting an object portion with an unwantedportion removed from an object image on the basis of a plurality ofobject images obtained by shooting an object of interest a plurality oftimes and a plurality of background images obtained by shooting only thebackground of the object of interest a plurality of times includesaverage/standard deviation computation means, average computation means,region segmentation means, difference means, average differencecomputation means, average standard deviation computation means, andextract means. The average/standard deviation computation means computesfor each pixel the mean value and the standard deviation of pixelslocated at the same coordinates in the plurality of background images.The average computation means computes for each pixel the mean value ofthe pixels located at the same coordinate in the plurality of objectimages. The region segmentation means divides the object image into aplurality of regions. The difference means computes an absolute value ofdifference between the mean value of respective pixels in each region ofthe object image and the mean value of corresponding pixels in theregion of the background image corresponding to the relevant region. Theaverage difference computation means computes the mean value of theabsolute values of difference for every region. The average standarddeviation computation means computes the mean value of the standarddeviation for each region. The extract means extracts a region out ofthe plurality of regions having a mean value of the absolute values ofdifference greater than a predetermined times the mean value of thestandard deviation.

According to the above object extraction apparatus, a portion in theobject of interest of a color identical to that of the background, ifany, can be detected and extracted as a portion of the object. The taskto be carried out manually can be reduced. Also, a special shootingenvironment is dispensable.

According to yet a further aspect of the present invention, athree-dimensional model generation apparatus for generating athree-dimensional model of an object of interest includes: shootingmeans for shooting the background of an object of interest and shootingthe object of interest including the background; silhouette generationmeans obtaining the difference between a background image obtained byshooting only the background and a plurality of object images obtainedby shooting the object of interest with the background for generating aplurality of silhouette images; and means for generating athree-dimensional model of the object of interest using the plurality ofsilhouette images.

The three-dimensional model generation apparatus preferably includesrotary means for rotating the object of interest.

According to yet an additional aspect of the present invention, athree-dimensional model generation apparatus of generating athree-dimensional model of an object of interest includes: silhouettegeneration means for generating a plurality of silhouette images of theobject of interest, estimation means for estimating the existing regionof the object of interest in a voxel space according to the plurality ofsilhouette images; and means for generating a three-dimensional model ofthe object of interest using the object of interest existing regionobtained by the estimation means.

Preferably, the estimation means carries out a voting process on thevoxel space.

Preferably, the three-dimensional model generation apparatus furtherincludes threshold value processing means for setting the portion havinga vote score greater than a predetermined threshold value as a result ofthe voting process.

According to the above three-dimensional model generation apparatus, aspecial shooting environment such as a backboard of the same color isdispensable since a three-dimensional model is generated using asilhouette image obtained by carrying out difference processing.

Since a three-dimensional model is generated by carrying out a votingprocess on voxel space on the basis of a plurality of silhouette images,a three-dimensional model can be generated at high accuracy even whensome of the plurality of silhouette images is improper.

Since the three-dimensional model is generated by polygonalapproximation of the contour line of a plurality of cut out planesobtained by cutting a three-dimensional shape of an object of image, theamount of data for three-dimensional model generation can be reduced toallow high speed processing.

Since a three-dimensional model is generated by polygonal approximationof the contour line of a plurality of cross sectional shapes of anobject of interest, the amount of data for three-dimensional modelgeneration can be reduced to allow high speed processing.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram representing a concept of a conventional method oftexture information assignment.

FIG. 2 is a sectional view for showing the problem in the conventionalmethod of texture information assignment.

FIG. 3 is a schematic block diagram showing a structure of athree-dimensional model generation apparatus according to a firstembodiment of the present invention.

FIG. 4 is a schematic block diagram showing a structure of a colorinformation assignment processing unit in the three-dimensional modelgeneration apparatus of FIG. 3.

FIG. 5 is a flow chart showing the flow of the process to generate athree-dimensional model from a real object.

FIG. 6A is a diagram to describe image shooting of step S10 in FIG. 5.

FIG. 6B is a diagram for describing silhouette image generation of stepS12 in FIG. 5. FIG. 6C is a diagram for describing a voting process ofstep S14 in FIG. 5. FIG. 6D is a diagram for describing polygongeneration of step S16 in FIG. 5. FIG. 6E is a diagram for describingtexture mapping of step S18 of FIG. 5.

FIG. 7 is a perspective view representing the concept of the votingprocess.

FIG. 8 is a cross section of a P plane indicating the hypotheticalexisting region in the voting process.

FIG. 9 is a sectional view of the P plane representing the concept ofthe voting process.

FIG. 10A is a diagram for describing the concept of a polygon generationprocess. FIG. 10B is an enlargement view of the 10B portion in FIG. 10A.

FIG. 11 shows a three-dimensional shape model by polygons.

FIG. 12A is a diagram representing the concept of the process ofassigning texture information. FIG. 12B is an enlarged view of the 12Bportion in FIG. 12A.

FIG. 13 is a flow chart showing the flow of the process of assigningtexture information to each three-dimensional shape constituent element.

FIG. 14 is a flow chart showing the flow of the process of the textureinformation assignment method according to the first embodiment.

FIG. 15 is a diagram representing the concept of a recording medium inwhich is recorded the texture information assignment method of the firstembodiment.

FIG. 16 is a flow chart showing the flow of the process of the textureinformation assignment method according to a second embodiment of thepresent invention.

FIG. 17 is a flow chart showing the flow of the process of a textureinformation assignment method according to a third embodiment of thepresent invention.

FIG. 18 is a diagram representing the concept of the method of storingtexture information into the color information storage unit of FIG. 4.

FIG. 19 is a flow chart showing the flow of the process of the textureinformation assignment method according to a fourth embodiment of thepresent invention.

FIG. 20 is a diagram representing the concept of the texture informationassignment method according to a fifth embodiment of the presentinvention.

FIG. 21 is a flow chart showing a flow of the process of the textureinformation assignment method according to the fifth embodiment of thepresent invention.

FIG. 22 is a diagram showing the entire structure of an objectextraction apparatus (image cut out apparatus) according to a sixthembodiment of the present invention.

FIG. 23 is a block diagram schematically showing an object extractionapparatus (image cut out apparatus) according to the sixth embodiment ofthe present invention.

FIG. 24 is a block diagram schematically showing the arithmetic logicunit of FIG. 22.

FIGS. 25A-25C are diagrams to describe in detail the process carried outby the difference processing unit, the mean value output unit, and thethreshold value processing unit of FIG. 24.

FIG. 26 is a flow chart showing main components of an object extractionapparatus according to a seventh embodiment of the present invention.

FIG. 27A shows an object image divided into a plurality of regions Robtained in the object extraction apparatus of FIG. 26. FIG. 27B showsan image displaying depth information in luminance. FIG. 27C shows animage of an object portion extracted with the background portion removedfrom the object image.

FIG. 28 is a flow chart showing main components of an object extractionapparatus according to an eighth embodiment of the present invention.

FIG. 29 is a flow chart showing main components of an object extractionapparatus according to a ninth embodiment of the present invention.

FIG. 30 is a flow chart showing main components of an object extractionapparatus according to a tenth embodiment of the present invention.

FIG. 31 is a flow chart showing main components of an object extractionapparatus according to an eleventh embodiment of the present invention.

FIG. 32 is a block diagram schematically showing a three-dimensionalmodel generation apparatus according to a twelfth embodiment of thepresent invention.

FIG. 33 is a flow chart showing a flow of the process in thethree-dimensional model generation apparatus of FIG. 32.

FIG. 34 is a diagram for describing the perspective ratio obtained atstep S8 of FIG. 33.

FIGS. 35A-35C are diagrams to describe the position relationship betweenthe camera and the turntable obtained at step S8 in FIG. 33.

FIG. 36 is a diagram for describing a voxel in the cylindricalcoordinate system voxel space used at step S14 of FIG. 33.

FIG. 37 is a diagram for describing the voting process at step S14 ofFIG. 33.

FIG. 38 shows the results of the voting process at step S14 of FIG. 33.

FIG. 39A is a diagram to describe the specific contents of polygongeneration at step S16 of FIG. 33. FIG. 39B is an enlargement view ofthe 39B portion in FIG. 39A.

FIG. 40 is a diagram for describing the flow of polygon generation atstep S16 of FIG. 33.

FIG. 41 is a diagram showing the relationship between verticescorresponding to the contour lines of adjacent cut out planes obtainedat step SA2 of FIG. 40.

FIG. 42 is a diagram to describe the local most proximity pointconnection strategy at step SA3 of FIG. 40.

FIG. 43 shows a polygon obtained by the local most proximity pointconnection strategy at step SA3 of FIG. 40.

FIG. 44 shows a part of the flow of the polygon generation by the localmost proximity point connection strategy at step SA3 of FIG. 40.

FIG. 45 shows the remaining part of the flow of the polygon generationby the local most proximity point connection strategy at step SA3 ofFIG. 40.

FIG. 46 is a diagram for describing the flow of polygon generation bythe global shortest connection strategy at step SA3 of FIG. 40.

FIG. 47 shows a CD-ROM in which a program is recorded to generate athree-dimensional model of an object of interest by the computer of FIG.3.

BEST MODE FOR CARRYING OUT THE INVENTION

Embodiments of the present invention will be described in detail withreference to the drawings. In the drawings, the same or correspondingcomponents have the same reference characters allotted, and theirdescription will not be repeated.

First Embodiment

FIG. 3 is a schematic block diagram showing a structure of athree-dimensional model generation apparatus 1000 to reconstruct athree-dimensional model from an actual object according to a firstembodiment of the present invention. Referring to FIG. 3, an object ofinterest 100 is mounted on a turntable 110. Turntable 110 has its angleof rotation controlled according to, for example, a control signal froma computer 130. A camera 120 shoots the rotating object of interest 100at every specified angle. The obtained image data is applied to computer130. Data of the shooting condition such as the rotary pitch ofturntable 110 and the like is applied to computer 130 from an inputdevice 140.

Computer 130 extracts a silhouette image from the image informationcorresponding to each shooting angle according to the image informationapplied from camera 120 to generate a three-dimensional shape model.Here, a three-dimensional shape model can be represented by a set of,for example, polygons (triangular patches). The aforementioned imageinformation implies numeric information representing the luminance,color, or the gray level corresponding to each pixel output from camera120. However, representation of a three-dimensional model is not limitedto such a representation method. For example, a three-dimensional modelcan be represented as a group of surface shape elements of differentshapes. Therefore, the shape that is the element for representing ashape model is generically referred to as a three-dimensional shapeconstituent element.

Next, computer 130 applies texture information on the reconstructedshape model according to the image information picked up at each angle.Here, color information (texture information) in the field of CG(Computer Graphics) refers to the image information to represent theasperity, design, pattern, and material quality of the surface of theobject. Such a reconstructed three-dimensional model is displayed on adisplay device 150.

Prior to the detailed description of the method of applying colorinformation (more generally, texture information) with respect to thethree-dimensional image information, the flow of generating athree-dimensional shape model from a real object will first be describedbriefly.

FIG. 5 is a flow chart showing the flow of the process from thegeneration of an object image up to assignment of texture information toa shape model. FIGS. 6A-6E are diagrams representing the concept of thedata process of each flow.

Referring to FIG. 6A, computer 130 controls the angle of rotation ofturntable 110 according to the shooting condition data applied frominput device 140. Object images A1-An shot by camera 120 for every angleof rotation are input (step S10). If an object image is input fromcamera 120 at the angle of every 10°, 36 object images A1-An are inputin one turn of rotation.

Referring to FIG. 6B, computer 130 extracts the contour of the objectfigure from each of the shot object images A1-An to generate silhouetteimages B1-Bn of the object viewed from various directions (step S12).

The obtained silhouette images B1-Bn indicate the contour of object 100viewed from various directions. As shown in FIG. 6C, a voting processthat will be described afterwards is carried out on thethree-dimensional space divided into virtual voxels on the basis of thecontour figure of the object viewed from various directions. Theexisting region of object 100 within voxel space 251 is estimated (stepS14).

Referring to FIG. GD, the object region represented by voxel space 251is converted into the representation of shape model 300 using a polygon(triangular patch) 27 (step S16). Here, the accuracy of the representingshape must be maintained while suppressing the required number ofpolygons for representation. Therefore, polygon 27 can be generatedaccording to the method set forth in the following.

In voxel space 251 represented by the cylindrical coordinate system, thecontour line of the cut plane at a surface 0 of the cylindricalcoordinate system is approximated in polygons to determine the vertex ofpolygon 27. Then, a triangular patch is generated by connecting thethree closest vertices of respective vertices.

Referring to FIG. 6E, texture information is assigned from the imageinformation picked up at step S10 on each triangular patch of thegenerated polygon 27 (step S18).

By the above flow, a three-dimensional model 27 is reconstructed incomputer 130 on the basis of image information picked up by shooting areal object with camera 120.

Although the above description corresponds to a structure in which anobject of interest 100 is placed in a fixed manner on rotary table 110to pick up an image of the object with turntable 110 rotated, the methodof capturing image information is not limited to such a structure.

For example, image data can be obtained by shooting object of interest100 from a plurality of viewpoints with camera 120 carried by anoperator moving around stationary object 100. By identifying theposition of camera 120 and the shooting direction for each shot of animage, three-dimensional shape model 300 can be reconstructed andtexture information assigned by this information by a method similar tothat set forth in the following.

Now, each processing step of FIG. 5 will be described in detail.

Image Shooting and Silhouette Image Generation

An image is shot by the above-described structure shown in FIG. 3 byplacing target object 100 on turntable 110 and shooting a plurality ofobject images A1-An while table 110 is rotated. Additionally, abackground image is shot to extract a silhouette image at the next stepof S12.

By the difference processing between object images A1-An and abackground image, silhouette images B1-Bn with only the cut out of theobject of interest are generated.

An image difference process including a region segmentation process thatwill describe afterwards, not the simple difference process betweenimages, can be carried out to eliminate the need of a special shootingenvironment to obtain a background image of a single color, and to allowstable silhouette image generation.

Specifically, object images A1-An are divided into regions. Thedifference processing with the background image is carried out on aregion-by-region basis. Here, difference processing implies the processof computing the difference of the signal intensity in the shot objectimage information per pixel. Furthermore, the difference mean issubjected to the threshold process on a pixel-by-pixel basis to extractthe object portion.

By the above-described method, a portion of an object having a coloridentical to that of the background in the pixel level, if any, can bedetected as the object portion if there is a color differing from thatof the background in the region level. Therefore, the accuracy of thegenerated silhouette image can be improved.

Voting Process

Silhouette image information of an object of interest 100 picked up froma plurality of viewpoints can be obtained. A voting process set forth inthe following is carried out to reconstruct a three-dimensional shape ofan object from these plurality of silhouette images information.

First, a voxel model and voxel space 251 to describe a three-dimensionalshape will be explained with reference to FIG. 7.

A voxel model is a model that describes a three-dimensional shapeaccording to the absence/presence of a three-dimensional lattice point.The space defined by voxels is referred to as voxel space 251. Voxelspace 251 is arranged with a size and position that encloses the objectto be recognized. Here, this voxel space 251 is represented with thecylindrical coordinate system that can represent the shape of a targetobject in a more natural manner with respect to the pickup of an imagewhile rotating object 100 of interest.

Therefore, each voxel implies the volumeric element in which r, θ, and zare divided at equal intervals where r is the coordinate in the radialdirection of the cylindrical coordinate, θ is the coordinate in theangle direction, and z is the coordinate in the direction of the axis.The voxel model is a representation of a three-dimensional shape by aset of these volumeric elements.

The procedure of reconstructing this voxel model from silhouette imagesB1-Bn will be described briefly hereinafter.

First, a hypothetical existing region 50 with respect to an object ofinterest is computed in voxel space 251 according to one silhouetteimage. Here, a hypothetical existing region 50 implies a conical regionwith the projection center 51 of camera 120 as the vertex and the objectfigure of the image as a cross sectional shape as shown in FIG. 7. Inother words, object 100 of interest is always present inside thisregion.

A voting process implies the process of assigning (voting) a number 1 toeach voxel residing within hypothetical existing region 50 when relevantto a hypothetical existing region 50 for one silhouette image, forexample.

FIG. 8 shows a cross section of a silhouette image and cylindrical voxelspace 251 on a plane P perpendicular to the z axis shown in FIG. 7.

Since the conical region of projection center 51 of camera 120 ispertinent to hypothetical existing region 50, the numeric of 1 isassigned to each voxel in cylindrical voxel space 251 where this region50 exists.

FIG. 9 is a cross sectional view of cylindrical voxel space 251 at thecross section of plane P for the case where voting process is carriedout according to the plurality of silhouette images B1-Bn.

FIG. 9 shows the case where the voting process is carried out oncylindrical voxel space 251 according to silhouette images B1-B5 shotfrom 5 viewpoints. Since the numeric of 1 is assigned to eachhypothetical existing region 50 for respective silhouette images in thevoting process according to each silhouette image, the cross-hatchedregion in FIG. 9 has hypothetical existing region 50 according to allsilhouette images B1-B5 overlapped thereon when the voting process iscarried out according to five silhouette image B1-B5. In other words,the numeric of 5 is assigned to the voxels in the cross-hatched regionas a result of the voting processing according to five silhouette imagesB1-B5.

Therefore, by extracting only the voxel assigned with a numeric of atleast 5 among the voxels in cylindrical voxel space 251, the regionwhere object 100 of interest exists in this cylindrical voxel space 251can be obtained.

In general, the region where the object of interest exists incylindrical voxel space 251 can be computed according to a votingprocess by setting an appropriate threshold value according to thenumber of shot object images. According to the above process, the regionwhere object 100 is present in cylindrical voxel space 251 can beextracted.

As a three-dimensional model generation method using voxel space 251, acone-silhouetting method is disclosed in U.S. Pat. No. 4,982,438. Thissystem has the problem that any error in the generated silhouette imagewill directly affect the shape of the object that is reconstructed. Incontrast, the three-dimensional model generation method by the votingprocess is characterized in that, even when there is an error in thebasic silhouette image, reduction in the accuracy of the capturedthree-dimensional shape can be minimized by setting an appropriatethreshold value.

Polygon Generation

The object region represented by voxel space 251 is transformed so as tobe represented with a shape model 300 using polygons (triangular patch)27.

FIGS. 10A and 10B represent the concept of such a polygon generationprocess. Referring to FIGS. 10A and 10B, the contour line of a cut planeat the cylindrical coordinate system 01 plane (a plane where θ=θ1 incylindrical coordinate system) of the object region represented bycylindrical voxel space 251 is subjected to polygonal approximation.Each vertex of contour line Lθ1 obtained by this polygonal approximationcorresponds to the vertex of polygon 27 as will be described afterwards.Similarly, the contour line of the cut out at plane θ2 of thecylindrical coordinate system is subjected to polygonal approximation toobtain Lθ2. This operation is carried out on plane θ corresponding toall the voxels.

Then, each vertex of these contour lines is connected regarding therespective closest three vertices to generate triangular patch 27. Bygenerating triangular patch 27 by the process of polygonal approximationof the contour line and connection of the closest three vertices, thenumber of polygons required for representation can be suppressed and theaccuracy of the representation shape can be maintained.

FIG. 11 shows a three-dimensional shape model 300 representing object100 with polygons 27.

By the above operation, the shape of the object of interest can bereconstructed within the computer.

Although the above description is provided on the basis of a cylindricalvoxel space 251, an orthogonal voxel space can be used instead. Byconnecting adjacent voxels in the polygon generation process, the smallpolygons can be generated, and then consolidated to reduce the number ofpolygons.

Texture Mapping

In order to assign texture information to the object shape reconstructedin the computer for a more real three-dimensional model, the process ofassigning texture information included in the shot object images A1-Anto three-dimensional shape model 300 is carried out.

More specifically, the object image from which the texture informationof each polygon 27 is to be applied (referred to as “reference image”hereinafter) is determined. Then, polygon 27 is projected on thereference image. The texture information of that projected area isassigned to the corresponding polygon 27.

FIGS. 12A and 12B are diagrams for describing the concept of the textureinformation assignment process. For the sake of simplification, it isassumed that there are eight object image information labeled 1-8 as thereference image. More specifically, there is an object image of thetarget object shot from the angle of every 45°. Description is providedof assigning texture information to shape model 300 according to areference image of a target object shot at every constant angle aboutone axis of rotation. However, the present invention is not limited tosuch a case, and can be applied to the case where texture information isapplied to shape model 300 according to a plurality of reference imagesof a target object shot from an arbitrary position and direction.

In determining which reference image is to be corresponded with respectto a target polygon 27, the approach of selecting the reference imagewith the greatest texture information amount for the relevant polygon 27is to be taken into account.

By assigning a corresponding reference image, i.e. a label number, toeach polygon 27 according to the above approach, texture information canbe applied to shape model 300 represented by polygons 27.

FIG. 13 is a flow chart showing the process up to texture informationapplication. FIG. 4 is a schematic block diagram showing a structure ofa color information assignment processor 200 to assign textureinformation in computer 130.

Color information assignment processor 200 includes an image storageunit 220 for storing object image information (reference imageinformation) picked up by camera 120, an arithmetic logic unit 210 forgenerating a shape model 300 of a target object according to referenceimage information stored in image storage unit 220, a shape storage unit230 for storing shape model 300 generated by arithmetic logic unit 210,i.e., the position and shape of each polygon 27, and a color informationstorage unit 240 for storing texture information assigned to eachpolygon 27 by arithmetic logic unit 210 according to the reference imageinformation stored in image storage unit 220.

Referring to FIGS. 13 and 4, image information obtained by shooting atarget object rotated at every predetermined angle is stored in imagestorage unit 220 (step S20).

According to the picked up image information, arithmetic logic unit 210generates shape model 300. The shape data is stored in shape storageunit 230 (step S22).

Then, correspondence between a three-dimensional shape constituentelement (for example, polygon 27) and the reference image informationstored in image storage unit 220 is set by arithmetic logic unit 210according to the procedure set forth in the following (step S24).

Arithmetic logic unit 210 has the texture information of eachcorresponding polygon 27 stored in color information storage unit 240(step S26).

The process of correspondence between a three-dimensional shapeconstituent element and reference image information of step S24 will bedescribed in further detail hereinafter.

FIG. 14 is a detailed flow chart of the flow for correspondence betweena three-dimensional shape constituent element and reference imageinformation of step S24.

In the following process, the amount of texture information isdetermined according to the degree of match between the normal vector ofeach three-dimensional shape constituent element (polygon 27) and thenormal vector of the image shooting plane parallel to the direction inwhich the reference image was shot. More specifically, the referenceimage that is most positively opposite the relevant polygon 27 isselected as the reference image having the greatest texture informationwith respect to that polygon 27.

Following the completion of each process of shooting (step S20) andshape model generation (step S22), arithmetic logic unit 210 initializesthe variables used in the following calculation.

Specifically, the number of three-dimensional shape constituent elementsis inserted into variable Emax, and the number of shot images isinserted into variable Imax. In the following process, the auxiliaryvariable Icnt that counts the corresponding label number with respect tothe shot image is initialized to the value of 0.

Also, all the values of the first dimensional array variable Prod [i]respectively corresponding to the i-th (i=0˜Emax−1) three-dimensionalshape constituent element are initialized to the value of 0 while thevalues of the first dimensional array variable Id [i] to which the labelnumber of the reference image corresponding to the i-ththree-dimensional shape constituent element is inserted are allinitialized to the value of −1 (step S2402).

Then, auxiliary variable Ecnt to count the number of polygon 27 isinitialized to the value of 0 (step S2403).

The inner product of the normal vector of the Icnt-th image shootingplane and the normal vector of Ecnt-th three-dimensional shapeconstituent element is computed. This computed value is inserted intovariable Vtmp (step S2404).

Arithmetic unit 210 compares the value of the Ecnt-th variable Prod[Ecnt] with the value of variable Vtmp.

When determination is made that the value of variable Prod [Ecnt] isequal to or smaller than variable Vtmp (step S2406), the value ofvariable Vtmp is inserted into variable Prod [Ecnt]. Simultaneously, thecurrent value of count variable Icnt is inserted into variable Id [Ecnt](step S2408).

When determination is made that the value of variable Prod [Ecnt] isgreater than the value of variable Vtmp (step S2406), the value ofvariable Ecnt is incremented by one (step S2410).

When determination is made that the value of count variable Enct issmaller than the number of three-dimensional shape constituent elementsEmax (step S2412), control returns to the process of step S2404. Thesame process is repeated on the next three-dimensional shape constituentelement.

When determination is made that the value of variable Ecnt is equal toor greater than the number of three-dimensional shape constituentelements Emax (step S2412), the value of count value Icnt is incrementedby 1 (step S2414).

Then, determination is made whether the value of count variable Icnt isequal to or greater than the number of shot images Imax (step S2416).

When determination is made that the value of variable Icnt is smallerthan the value of Imax (step S2416), the process from step S2403 to stepS2412 is repeated for the next reference image.

However, when the value of variable Icnt is identified to be equal to orgreater than the number of shot images Imax (step S2416), controlproceeds to the process set forth in the following.

According to the process from step S2402 to step S2416, the value of theinner product between the surface normal vector of the relevantreference image and the surface normal vector of all thethree-dimensional shape constituent elements is compared for eachreference image. As a result of this process, the reference image thathas an inner product value greater than the inner product value of allthe previous reference images process is stored in the first dimensionalarray variable Prod [Ecnt] for each three-dimensional shape constituentelement. Then, the label number of the current reference image is storedin first dimensional array variable Id [Ecnt].

Therefore, at the transition from the process of step S2416 to the nextprocess, the label number of the reference image information having thelargest inner product value for the corresponding i-th three-dimensionalshape constituent element is stored in first dimensional array variableId [i].

Then, arithmetic logic unit 210 reads out the corresponding referenceimage information for each three-dimensional shape constituent elementfrom image storage unit 220, and stores the read out information intocolor information storage unit 240 (step S2418).

According to the above-described structure of applying the colorinformation (texture information) obtained from the reference imageinformation that has the greatest amount of texture information isassigned to each three-dimensional shape constituent element (polygon27) forming shape model 300, the texture information most approximatingthe actual object can be assigned to each three-dimensional shapeconstituent element.

FIG. 15 represents the concept of the structure of a recording medium inwhich the program to execute the texture assignment method of FIG. 14 bycomputer 130 is stored.

A magneto-optical disk such as a magnetic disk or a CD-ROM (Compact DiskRead Only Memory) can be used as the recording medium. A program to havecomputer 130 execute the process of FIG. 14 is described in variousprocess steps by a predetermined programming language to be coded andrecorded in recording medium 260.

By operating computer 130 according to the texture informationassignment program stored in recording medium 160, the effect asdescribed above can be obtained. In other words, texture informationapproximating the texture of the actual object can be assigned to shapemodel 300 reconstructed in computer 130.

Second Embodiment

In the previous first embodiment, the reference image determined ashaving the greatest texture information amount according to the innerproduct value with respect to each three-dimensional shape constituentelement is selected to apply the texture information to eachthree-dimensional shape constituent element.

However, there is a case where a portion of the target object cannot beviewed in the object information shot from a certain direction dependingupon the shape of the target object as described with reference to FIG.2. In this event, there may be the case where the reference image havingthe greatest inner product value with respect to the surface normalvector of the three-dimensional shape constituent element correspondingto this occluded region is completely absent of the texture information.

The second embodiment provides a method and apparatus of textureinformation assignment that is applicable to such an event, and a mediumin which the texture information assignment program is recorded.

The structure of the color information assignment processor of thesecond embodiment is identical to that of color information assignmentprocessor 200 of FIG. 4. The operation carried out by arithmetic logicunit 210 differs from that of the first embodiment, as will be describedhereinafter.

In contrast to the first embodiment in which the amount of the textureinformation is determined by comparing the inner product values betweenthe normal vector of each polygon 27 and the normal vector of eachreference image, the second embodiment evaluates the amount of textureinformation of each reference image information on the basis of aprojection area of each polygon 27 with respect to a reference image.

FIG. 16 represents a flow chart of the process to determine the labelnumber of the corresponding reference image information for each polygon27 according to the projection area of polygon 27 on a reference image.

The flow chart of FIG. 16 is similar to the flow chart of FIG. 14,provided that the value of evaluation is the projection area Atmp of thethree-dimensional shape constituent element projected on the referenceimage instead of the inner product value Vtmp between the normal vectorof the reference image plane and the normal vector of three-dimensionalshape constituent element.

Therefore, at the stage when the process from step S2422 to step S2436is completed, the label number of the reference image information havingthe largest projection area for the corresponding i-th three-dimensionalshape constituent element is stored in first dimensional array variableId [i], and the projection area corresponding to the reference imageinformation having the label number of Id [i] for the corresponding i-ththree-dimensional shape constituent element is stored in firstdimensional array variable Area [i].

Accordingly, arithmetic logic unit 210 reads out from image storage unit220 the texture information of the corresponding reference image forevery three-dimensional shape constituent element and stores the sameinto color information storage unit 240.

By the above-described process, texture information can be assigned toeach polygon 27 from the reference image information having the greatesttexture information amount with respect to shape model 300 reconstructedin a computer and the like even for an object of interest that has arelatively complicated shape. A similar effect can be provided byoperating computer 130 with a medium in which the program such as fromstep 2422 to step S2438 of FIG. 16 is recorded.

Third Embodiment

In the above description, the correspondence between a reference imageand each polygon 27 was determined according to the amount of textureinformation with respect to the relevant polygon 27 when target object100 is reconstructed as shape model 300.

The issued to be taken into account in determining an appropriatereference image for each polygon 27 is not limited to the amount of thetexture information. For example, when there is noticeable discontinuityin the texture information assigned between polygons 27, the boundaryline of polygons will become so appreciable that the reconstructedthree-dimensional model 29 will have an extremely unnatural visualresult.

Therefore, the method of assigning a reference image to eachthree-dimensional shape constituent element, i.e. the textureinformation assignment method, of the third embodiment is directed toselect a reference image of a great texture information amount, andsuppressing the polygon boundary line at the same time.

As previously described in the second embodiment, a larger polygonprojection area on a corresponding reference image is desirable in orderto select a reference image of a large amount of texture information.

However, high continuity in the color information (texture information)applied between adjacent polygons 27 is desirable in order to hide thepolygon boundary line.

The third embodiment is implemented so that assignment of a referenceimage to a polygon 27 adjacent to a target polygon 27 is carried out byselecting the same, or if different, a reference image with the smallestdifference in the shooting angle, to conceal the polygon boundary line.

More specifically, for the purpose of enabling assignment of referenceimage information to a polygon 27 that satisfies the above-described twoconditions in an optimum manner, the problem is seen as the so-calledenergy minimization problem that is set forth in the following.

Since each reference image is shot by altering the shooting angle forevery predetermined angle, a number is assigned in order to eachreference image. The correspondence between each polygon 27 and thereference image number (labeling problem) is solved by the iterativeimprovement process of locally minimizing the energy represented by thefollowing equation.

When each reference image is not shot at every different predeterminedangle, i.e., when the varying stepped amount of the shooting anglediffers, the above numbering is to be set in correspondence with theshooting angle. $\begin{matrix}{E = {{\sum\limits_{i}{{Penalty}(i)}} - {k \times {{area}(i)}}}} & (1)\end{matrix}$

Here, Area (i) represents the projection area of polygon i on thereference image, Penalty(i) represents the difference in the referenceimage number (label) between polygon i and the adjacent polygon, and krepresents the coefficient of association.

More specifically, energy function E increases as the difference becomesgreater between the reference image number assigned to the polygonadjacent to polygon i and the reference image number assigned to polygoni, and decreases as the projection area of polygon i on the referenceimage, i.e. the amount of texture information, increases.

Since the texture continuity is higher as the difference in the numbersof the reference images assigned to polygon i and the adjacent polygonbecomes smaller to suppress the polygon boundary line, setting a minimumfunction E is equal to the assignment of the optimum reference imagenumber to each polygon taking into account both the amount of textureinformation (the amount of color information) and the texturecontinuity.

Although the projection area of polygon i on the reference image isemployed as the degree of the texture information amount in the aboveenergy function, a structure can be employed in which evaluation iseffected according to the inner product value between the surface normalvector of the polygon and the surface normal vector of the referenceimage as described in the first embodiment.

Regarding energy function E, coefficient of association k may be aconstant or a function of each polygon (for example, a function of thearea of each polygon).

Energy function E is not limited to the above-described structure. Anyfunction that decreases in accordance with improvement in the continuityof the texture information assigned to target polygon i and an adjacentpolygon, and that decreases in accordance with the increase of theamount of the texture information with respect to target polygon i canbe used, in addition to the linear combination between the abovefunction Penalty(i) and function area (i).

FIG. 17 is a flow chart of the process to obtain the optimum value forthe above energy function E with the iterative improvement process.

First, provisional correspondence between each polygon of the generatedshape model 300 and the reference image number is carried out forinitialization (step S2440).

Arithmetic logic unit 210 inserts the number of three-dimensional shapeconstituent elements into variable N, and initializes count variable Cntto the value of 0. Also, flag variable Flg is set to OFF (step S2442).

The reference image number corresponding to the Cnt-th three-dimensionalshape constituent element is inserted into variable Pre_lbl (stepS2444).

Then, the corresponding reference image number is varied for the Cnt-ththree-dimensional shape constituent element to extract the referenceimage number that minimizes energy function E (step S2446).

Then, the new corresponding reference image number obtained at stepS2446 is inserted into variable New_lbl (step S2448).

Then, the value of variable New_lbl is compared with the value ofvariable Pre_lbl. When the values are not equal to each other (stepS2450), determination is made that the label is altered by theminimization computation of energy function E. Flag variable Flg is setto ON (step S2452). Then, the value of count variable Cnt is incrementedby 1 (step S2454).

When the values of variables New_lbl and Pre_lbl are equal, the flagvariable is not altered, and only the value of count variable Cnt isincremented by 1 (step S2454).

When the value of count variable Cnt is smaller than the number ofthree-dimensional shape constituent elements N, control returns to theprocess of step S2444. If the value of count variable Cnt is equal orgreater than the number of three-dimensional shape constituent elementsN, control proceeds to the next process (step S2456).

Therefore, the process from step S2444 to step S2454 is repeated for allthe three-dimensional shape constituent elements.

Then, flag variable Flg is compared with OFF. When flag variable Flg isnot equal to OFF (step S2458), determination is made that the label hasbeen changed at least once according to the minimization calculation ofenergy function E, i.e., that the correspondence setting of the labelnumber that locally minimizes energy function E is not completed.Therefore, control returns to step S2442.

Flag variable Flg equal to OFF means that the label is not changed evenwhen the operation of minimizing energy function E is carried outaccording to the process from step S2444 to step S2456. In other words,the current label number correspondence is settled so as to locallyminimize energy function E. Thus, the process ends assuming that theoptimum correspondence is completed (step S2460).

By the above process, texture information assignment is carried out thatoptimizes simultaneously the two conditions of selecting the referenceimage information having a great amount of texture information with eachpolygon and suppressing the polygon boundary line in the process ofcorrespondence of a reference image number with respect to a pluralityof polygons.

Thus, shape model 300 subsequent to assignment has a color closer tothat of the real object and with a more natural texture continuity.

A similar effect can be achieved by operating computer 130 with a mediumin which the program of step S2440 to step S2460 is recorded.

It is desirable to take into account an appropriate processing sequencefor the repeated improvement process since the order will influence theeventual result of the improvement process. This is because the processis based on the assumption that, in improving the label number of eachpolygon in the iterative improvement process, the label number of anadjacent polygon is correct, or has high reliability. By carrying outsequentially the improvement process starting from a polygon of lowerreliability, a more favorable improvement result can be obtained.

Evaluation of the reliability of the polygon can be based on the area ofthe polygon or the area of the polygon projected on the reference image.

This is because the reliability of the provisional correspondence of thereference image number carried out at step S2240 becomes lower as thepolygon has a smaller area or has a smaller area projected on thereference image.

Fourth Embodiment

The texture information assignment method of the third embodiment takesinto account both the texture information amount (color informationamount), and suppression of the polygon boundary line, i.e., texturecontinuity.

However, in the event of picking up image information from a realobject, the image information picked up from a particular direction maydiffer significantly even from the image information picked up from anearby direction in the issue of glossiness due to the effect ofillumination and the like.

Therefore, there is a case where the method by the third embodiment isnot sufficient in order to assign texture information of higher texturecontinuity and suppressed polygon border line.

The texture information assignment method of the fourth embodiment isdirected to assign texture information to a corresponding polygon from aplurality of reference images information, i.e. image information pickedup from a plurality of directions, not from one reference imageinformation with respect to one polygon.

Prior to the description of the texture information assignment method ofthe fourth embodiment, the method of storing texture information intocolor information storage unit 240 will be described in more detail.

FIG. 18 represents the concept of the method storing data into a colorinformation storage unit.

Color information storage unit 240 stores the basic shape and texture ofa three-dimensional shape constituent element. Here, thethree-dimensional shape constituent element on the reference imageinformation has a shape differing from the original shape since it isbased on the shot shape.

It is therefore necessary to carry out shape transformation to store thecolor into color information storage unit 240.

Here, shape transformation in the case where the three-dimensional shapeconstituent element is a triangle will be described. Consider the caseof storing the texture information of the base shape by the twodimensional discrete space. Let the vertices of the basic shape be (x0,y0), (x1, y1), (x2, y2), and the vertices of the three-dimensional shapeconstituent element projected on the reference image information be (X0,Y0), (X1, Y1), (X2, Y2). By subjecting these to one-order conversionwith the following transformation matrix A and parallel displacementvector B, the projected triangular shape can be transformed into theoriginal shape. $\begin{matrix}{{A = \begin{pmatrix}a & b \\c & d\end{pmatrix}},{B = \begin{pmatrix}e \\f\end{pmatrix}}} & (2)\end{matrix}$

In this case, the texture information of a pixel (xn, yn) of the basicshape can be acquired from a pixel (Xn, Yn) on the reference imageinformation computed by the following equation. $\begin{matrix}{\begin{pmatrix}X_{n} \\Y_{n}\end{pmatrix} = {{\begin{pmatrix}a & b \\c & d\end{pmatrix}\begin{pmatrix}x_{n} \\y_{n}\end{pmatrix}} + \begin{pmatrix}e \\f\end{pmatrix}}} & (3)\end{matrix}$

By the above so-called affine transformation, the texture information ofthe original polygon shape is acquired for the projected triangularpolygon, and stored in color information storage unit 240.

Although a triangle is taken as the shape of the polygon in the abovedescription, similar computation can be carried out for other shapessuch as a rectangle.

The method of coordinate transformation can be carried out usingprojective transformation as well as the affine transformation. Theprojective transformation is computed by the following equation.$\begin{matrix}{{X_{n} = \frac{{a_{1}x_{n}} + {a_{2}y_{n}} + a_{3}}{{a_{7}x_{n}} + {a_{8}x_{n}} + 1}},{Y_{n} = \frac{{a_{4}x_{n}} + {a_{5}y_{n}} + a_{6}}{{a_{7}x_{n}} + {a_{8}x_{n}} + 1}}} & (4)\end{matrix}$

As described above, texture information corresponding to the originalpolygon shape is stored in color information storage unit 240irrespective of the shape of the polygon projected on the referenceimage information.

It is assumed that assignment of a reference image information numbercorresponding to polygon i is completed by the iterative improvementprocess for energy function E as indicated in the third embodiment.

The texture information assignment method of the fourth embodiment isdirected to the implementation of further improving the texturecontinuity by carrying out a weighted mean process that will bedescribed in the following subsequent to the completion of the labelnumber assignment.

FIG. 19 is a flow chart of the weighted mean process carred out afterassignment of a reference image information number with respect to eachpolygon i.

Therefore, this process is continuous from step S2460 of the flow shownin FIG. 17.

Initialization is carried out by inserting the number ofthree-dimensional shape constituent elements into variable Emax, and thenumber of reference image information shot into variable Imax. The valueof count variable Ecnt is initialized to 0 (step S2500).

Then, the values of count variable Icnt and variable wacc areinitialized to 0 (step S2501).

Determination is made whether the Icnt-th reference image information isthe input subject of the texture information of the Ecnt-ththree-dimensional shape constituent element (step S2502).

If the Icnt-th image is the input subject of the texture information,not only the assigned image information that is already carried out bythe assignment of a reference image number to a polygon(three-dimensional shape constituent element), but also a predeterminednumber of reference image information adjacent thereto, for example, thereference image information of one immediate preceding and succeedingimages, are included in the input subject.

Then, the value of the area of the Ent-th three-dimensional shapeconstituent element projected on the Icnt-th reference image informationis inserted into variable wght (step S2504).

The Icnt-th reference image information subjected to a weight ofvariable wght is stored in color information storage unit 240 as thetexture information of the Ecnt-th three-dimensional shape constituentelement (step S2506).

The values of variable wght are accumulated for variable wacc (stepS2508). The value of count variable Icnt is incremented by 1 (stepS2509).

The value of count variable Icnt is compared with the number of shotreference images Imax (step S2510).

When the value of variable Icnt is smaller than variable Imax, controlreturns to the process of step S2502.

When determination is made that the Icnt-th reference image informationis not the input subject of the texture of the Ecnt-th three-dimensionalshape constituent element at step S2502, control proceeds to step S2509.The value of variable Icnt is incremented by 1 (step S2509). Comparisonbetween the values of variables Icnt and Imax is carried out.

By repeating the process from step S2500 to step S2510, textureinformation that is weighted from a predetermined number of referenceimage information is acquired with respect to the Ecnt-ththree-dimensional shape constituent element. The texture informationthereof is accumulated in color information storage unit 240.

Then, the texture information accumulated in color information storageunit 240 is divided by the value of variable wacc (step S2512).

By the above process, texture information with respect to the Ecnt-ththree-dimensional shape constituent element is stored in colorinformation storage unit 240 as the weighted mean of the textureinformation from the corresponding predetermined number of referenceimage information.

In the above process, the area of the polygon projected on the referenceimage information corresponding to respective assigned reference imagenumbers for each polygon and adjacent predetermined number of objectimage information is obtained, which is used as the weighting factor forthe weighted mean process.

Here, it is assumed that the number of said object image information isIcnt, and the weighting factor corresponding to this object imageinformation is wght (Icnt). The number of these image information is N.

The texture information of a polygon is formed of a plurality of pixels.Here, attention is focused on the pixel of one texture information. Theposition of this pixel projected on the object image information isobtained. The image information pixel value of the projected position,i.e. color, density, or luminance) of that projected portion issubjected to the weighted mean process over all of the object imageinformation, i.e. for N object image information. That value is taken asthe pixel value of the texture information of interest. Assuming thatthe image information of the projected portion is V (Icnt), the weightedmean process corresponds to computation represented by the followingequation.

[Σwght(Icnt)×v(Icnt)]/Σwght(Icnt)  (5)

This process is carried out for all the pixels corresponding to thetexture information of the polygon. Then, the value of variable Ecnt isincremented by 1 (step S254).

Then, the value of count variable Ecnt is compared with the number ofthree-dimensional shape constituent elements Emax (step S2516).

When the value of variable Ecnt is smaller than the value of Emax,control returns to the process of step S2501. Thus, the weighted meanprocess of the texture information has been carried for all thethree-dimensional shape constituent elements.

When the value of count variable Ecnt is equal to or greater than thenumber of three-dimensional shape constituent elements Emax (stepS2516), the process of storing the texture information into colorinformation storage unit 240 ends (step S2518).

More specifically, the texture information assignment method of thefourth embodiment is first carried out by the correspondence setting ofa reference image information number (label number) for each polygon.The result of the weighted mean process according to the area of thethree-dimensional shape constituent element projected on each referenceimage information for the reference image information corresponding to apredetermined number (for example, the current corresponding referenceimage information number and the preceding and succeeding images) of thereference image information number including the related reference imageinformation number is assigned as the texture information of thatthree-dimensional shape constituent element.

By the weighted mean process of texture information from a predeterminednumber of reference image information, texture information for acorresponding polygon can be obtained. Therefore, texture informationimproved in texture continuity can be assigned to the relevant polygon.

For example, even in the case where the glossiness included in the colorinformation for the relevant polygon in the reference image informationpicked up from a certain direction is particularly high due to theeffect of illumination and the like when a real object is shot, theinfluence can be reduced by the weighted mean process.

A similar effect can be achieved by operating computer 130 with a mediumin which the program from step S2500 to step S2518 is recorded as shownin FIG. 19.

Fifth Embodiment

The fourth embodiment applies texture information to a correspondingpolygon from a predetermined number of adjacent reference imageinformation after assignment of the reference image number that acquiresthe texture information is completed for each polygon.

However, from the standpoint of attaching great importance on thetexture continuity, assignment of a reference image number for eachpolygon so as to set minimum the energy function E does not necessarilyhave to be carried out.

The texture information assignment method of the fifth embodiment isdirected to assign texture information for a polygon (three-dimensionalshape constituent element) from a plurality of reference imageinformation having the texture information for that three-dimensionalshape constituent element for each polygon.

For example, texture information can be assigned to a relevant polygon(three-dimensional shape constituent element) from all the referenceimage information having the texture information for thatthree-dimensional shape constituent element with respect to eachpolygon. Alternatively, reference image information can be selected atrandom or in an orderly manner regularly from the image informationincluding the texture information for a relevant three-dimensional shapeconstituent element with respect to each polygon (three-dimensionalshape constituent element), and assign texture information to therelevant polygon therefrom.

FIG. 20 represents the concept of the texture information assignmentmethod to a polygon. Texture information is assigned to the relevantpolygon from all the reference image information that includes thetexture information for the three-dimensional shape constituent element.

As described in the fourth embodiment, texture information correspondingto the original polygon shape is stored in color information storageunit 240 irrespective of the shape of the polygon projected on eachreference image information.

When a particular polygon i is of interest, texture information can beacquired by carrying out the weighted mean process according to theprojection area from all the reference image information having aprojection area that is not 0.

FIG. 21 represents a flow chart of such a texture information assignmentmethod.

After a plurality of images are shot with respect to an actual object(step S20 and shape model generation step S22), correspondence is setbetween each three-dimensional shape constituent element and thereference image information that has a projection area of the relevantthree-dimensional shape constituent element that is not 0 (step S30).

By carrying out the weighted mean process according to the projectedarea for color information storage unit 240 according to the abovecorrespondence, texture information is accumulated for eachthree-dimensional shape constituent element (step S32).

In the texture information assignment method of the fifth embodiment,the weighted mean process for a plurality of reference imagesinformation is carried out with the area of a three-dimensional shapeconstituent element projected on each of the plurality of referenceimages information as the weighting factor for every three-dimensionalshape constituent element. The result of the weighted mean processobtained for each three-dimensional shape constituent element isassigned as the texture information to each three-dimensional shapeconstituent element.

By assigning texture information on each three-dimensional shapeconstituent element from all the reference image information thatincludes texture information, texture continuity is further improved.

Even in the case where the information of the reference image shot froma certain direction is considerably high in glossiness than in theinformation of a reference image shot from another direction due to theeffect of illumination and the like, this influence of the textureinformation in the particular direction can be suppressed by applyingthe weighted mean process on the texture information from all therelating reference image information.

The present application is not limited to the above-described first tofifth embodiments in which texture information is assigned afterconverting shape model 300 into polygon data. The plane direction of thesurface can be computed for the shape model 300 represented in voxels toassign the texture information.

It is to be noted that assigning texture information after conversioninto polygon data is advantageous in that the amount of operation can bereduced significantly since the plane (polygon) oriented in the samedirection can be processed at one time.

Sixth Embodiment

FIG. 22 shows an entire structure of an object extraction apparatus(image cut out apparatus) according to a sixth embodiment of the presentinvention. Referring to FIG. 22, the object extraction apparatusincludes a computer 130. Computer 130 detects and extracts an objectportion in the object image according to a program 301 recorded in aCD-ROM 260. Program 301 includes a step S1 of carrying out the regionsegmentation process of an object image, a step S2 of the storageprocess of region information, a step S3 of the difference processbetween the object image and the background image for each region, astep S4 of obtaining the mean value of the absolute values of differencein each region, a step S5 of the detection process of an object portionby comparison between the mean value of absolute values of differenceand a threshold value, and a step S6 of extracting the detected objectportion. The details of steps S1-S6 will be described afterwards.

FIG. 23 is a block diagram schematically showing an object extractionapparatus (image cut out apparatus) according to a sixth embodiment ofthe present invention. Referring to FIG. 23, computer 130 correspondingto an object extraction apparatus includes an image storage unit 220, anarithmetic logic unit 210, a region information storage unit 241, and anextracted image storage unit 231. The details of the units of 220, 210,231, and 241 will be described afterwards.

FIG. 24 is a block diagram schematically showing arithmetic logic unit210 of FIG. 23. Referring to FIG. 24, arithmetic logic unit 210 includesa region segmentation unit 9 and an extraction unit 10. Extraction unit10 includes a difference process unit 11, a mean value output unit 13, athreshold value process unit 15 and an object portion extraction unit16. An object image A is obtained by shooting an object of interesttogether with the background by a pickup apparatus such as a camera.Background image B is obtained by shooting only the background of theobject of interest by a pickup apparatus such as a camera. Backgroundimage B and object image A are stored in image storage unit 220 of FIG.23. Although the background to be shot is generally located behind theobject of interest, some may be located in front of the object ofinterest.

Region segmentation unit 9 divides object image A into a plurality ofregions (step S1 of program 301 in FIG. 22). The information associatedwith region segmentation is stored in region information storage unit241 of FIG. 23 (step S2 of program 301 in FIG. 22). Differenceprocessing unit 11 carries out the difference process between objectimage A and background image B in the region level obtained by regionsegmentation unit 9 to acquire the difference (step S3 of program 301 ofFIG. 22). The difference is the difference in color information betweenobject image A and background image B obtained on a pixel-by-pixelbasis. Mean value output unit 13 obtains the absolute value of thedifference to output the mean value of the absolute values of thedifference in region level (step S4 of program 301 of FIG. 22). In otherwords, mean value output unit 13 provides the mean value of the absolutevalues of the difference for every region. Threshold value processingunit 15 compares the mean value of the absolute values of the differencein each region with a threshold value to detect a region having a meanvalue of absolute values of the difference greater than the thresholdvalue as the object portion (step S5 of program 301 in FIG. 22). Thethreshold value is set empirically. Object portion extraction unit 16extracts the object portion detected by threshold value processing unit15 (step S6 of program 301 of FIG. 22). In other words, object portionextraction unit 16 outputs the object portion detected by thresholdvalue processing unit 15. The image of the extracted object portion isstored in extracted image storage unit 231 of FIG. 23.

The region division carried out by region segmentation unit 9 will bedescribed in detail now. Region segmentation is carried out by thegenerally employed edge extension method, region-edge common usagemethod, Facet model method as described in, for example, “RecentTendency in Image Processing Algorithm”, pp. 227-233, Shin GijitsuCommunications, O plus E, edited by Takagi et al. Here, the edgeextension method will be described. First, the edge intensity and edgedirection is computed for each pixel from the first-order differential.Secondly, an edge element having the maximum value and that is greaterthan a predetermined value (called strong edge element) is extracted bythe maximal value suppression process and threshold process for the edgeintensity. At this stage, the strong edge element is not necessarilycontinuous. Thirdly, the edge is extended with the strong edge elementthat is the end point as the origin. This is the edge extension method.

FIGS. 25A-25C are diagrams to describe in detail the process ofdifference processing unit 11, mean value output unit 13, thresholdvalue processing unit 15, and object portion extraction unit 16 of FIG.24. Referring to FIG. 25A, object image 17 is formed of an objectportion 19 and a background portion 21. Background image 23 is formed ofonly background 25. Object image 17 is divided into a plurality ofregions a1-an by region segmentation unit 9 of FIG. 24.

The operation of difference processing unit 11 of FIG. 24 will bedescribed with region al as the target. Referring to FIG. 25B, thedifference in color information between each pixel of region al and eachpixel of region B1 of background 25 corresponding to region a1 isobtained. Accordingly, a set of difference c1 in region a1 is obtained.Mean value output unit 13 of FIG. 24 obtains the absolute value of thedifference forming the difference set c1, and obtains the mean value ofthe absolute values of the difference. Threshold value processing unit15 of FIG. 24 compares the mean value of the absolute values of thedifference forming difference set c1 with the threshold value. When themean value is equal to or greater than the threshold value, region a1corresponding to difference set c1 is detected as the object portion.Difference processing unit 11, mean value output unit 13 and thresholdvalue processing unit 15 carry out the above-described differenceprocess, output process of the mean value of the absolute values ofdifference, and the threshold value process for all regions a-an. Objectportion extraction unit 16 extracts the object portion detected bythreshold value processing unit 15 from object image 17. FIG. 25C showsobject portion 19 extracted as described above. Therefore, the unwantedportions such as background portion 21 is removed. When any objectlocated in front of the target object is included in object image 17,that portion will be removed as an unwanted area.

According to the object extraction apparatus of the sixth embodiment ofthe present invention, the object image is divided into a plurality ofregions, the mean value of the absolute values of the difference isobtained on a region-by-region basis, and a region having an mean valueequal to or greater than the threshold value is extracted as the objectportion. Therefore, according to the apparatus, method, and program ofobject extraction of the sixth embodiment, a portion of the targetobject having a color identical to that of the background, if any, canbe detected and extracted as an object portion at the pixel level aslong as there is a color differing from that of the background at theregion level. The task carried out manually can be reduced. Also, aspecial shooting environment in which a backboard of the same color mustbe used is dispensable.

Another example of the difference process carried out by differenceprocessing unit 11 of FIG. 24 will be described hereinafter. In contrastto the above description in which the difference is obtained in theregion level, difference processing unit 11 can obtain the difference,not in the region level, but by the difference process between theentire object image and the entire background image. Then, mean valueoutput unit 13 provides an mean value of the absolute values of thedifference in the region level obtained at region segmentation unit 9.

Alternatively, the mean value of the pixels in each region of the objectimage can be computed. Then, the absolute value of the differencebetween that mean value and the mean value of the pixels in the regionof the background image corresponding to that region is computed. Bycomparing the absolute value of the difference with a predeterminedvalue, the region having an absolute value of difference equal to orgreater than the predetermined value can be extracted as the objectportion.

Although region segmentation is effected on the basis of an edge in theabove sixth embodiment, the present invention can be carried out withthe portion of the same color as the same region. Also, a plurality ofregion segmentation methods can be combined.

Although a color image was taken as an example in the above sixthembodiment, the present invention is applicable to a black and whiteimage. Also, density information (luminance signal level) can be usedinstead of the above color information (color signal level).

Although a region that is equal to or greater than the threshold valueis directly taken as the object portion in the above sixth embodiment,the present invention is not limited to the process carried out onlyone. For example, the object portion detected by the first process canbe taken as a provisional object portion, and the remainder as aprovisional background portion. Then, the brightness of the provisionalbackground portion in the object image is compared with the brightnessof the region of the background image corresponding to the provisionalbackground portion to detect change in the illumination status betweenthe background image and the input image. Accordingly, the luminance inthe object image can be corrected uniformly to carry again the sameprocess.

Although the value of the threshold value is constant in the sixthembodiment, the value of the threshold can be modified to differ betweenthe center area and the peripheral area of the image. Alternatively, thevalue of the threshold can be modified according to the area size of theregion. Alternatively, the value of the threshold value can be modifiedaccording to whether there is an object portion in the neighborhood ornot if the process is to be carried out again.

Although the above sixth embodiment averages the absolute value of thedifference in each region and compares the obtained value with athreshold value, determination can be made in another way. For example,determination can be made whether the region is an object portion or nottaking into account the degree of variation of the values of difference.

Although the object portion is eventually extracted in the sixthembodiment, the present invention is not limited to this. For example,the invention is applicable to determine whether there is an object ornot, as well as the extraction process. Such determination can be usedin the application of sensing an intruder for a building monitor system.

Seventh Embodiment

FIG. 26 is a flow chart of the entire structure of the object extractionapparatus according to a seventh embodiment of the present invention.Steps S112-S118 of FIG. 26 corresponds to the program for computer 130of extracting an object portion with the background portion removed fromthe object image obtained by shooting an object of interest. Thisprogram is recorded in CD-ROM 260.

This program includes a step S112 of computing the depth information dp(i, j) of the object image obtained at step S111 for every pixel (i, j)by the stereo method, a step S113 of dividing the object image into aplurality of regions R, a step S114 of computing mean value mdp (R) ofdepth information for every region R, a step S115 of comparing meanvalue mdp (R) of the depth information with a predetermined thresholdvalue dpth, a step S116 of removing as the background portion a region Rif mean value mdp (R) of the depth information is greater than thresholdvalue dpth, more specifically, setting value v (i, j) of each pixel inthat region R to 0, a step S117 of extracting region R as the objectportion when mean value mdp (R) of the depth information is smaller thanthreshold value dpth, specifically setting value v (i, j) of each pixelin region R to 1, and a step S118 of determining whether the process ofsteps S115-S117 is carried out for all the regions R. Here, luminance(density), color information, or a combination thereof can be used asthe value of the pixel.

The operation of the object extraction apparatus according to theseventh embodiment of the present invention will be described withreference to the flow chart of FIG. 26.

At step S11, an object of interest is shot together with the backgroundusing a digital still camera and the like to obtain an object image.This object image is stored in image storage unit 220 in computer 130.Accordingly, v (i, j) is obtained as the value of each pixel (i, j).Although a still camera that shoots a still picture is used, a videocamera, a digital camera, or the like that shoots a motion picture canbe used instead.

At step S112, the depth information dp (i, j) of each pixel (i, j) iscomputed according to the stereo method and the like. This stereo methodis disclosed in, for example, “Computer Vision”, Prentice Hall, pp.88-93 by D. H. Ballard et al. According to the stereo method, an objectof interest is shot from two viewpoints remote by a predetermineddistance. A corresponding point between the two obtained object imagesis determined to compute the depth information dp (i, j) using thereverse projection transformation method or the simple triangulationmethod. An application of the stereo method is disclosed in, forexample, Japanese Patent Laying-Open No. 8-331607. Although the stereomethod is employed to compute the depth information, theshape-from-motion method based on the motion, the iterative improvementmethod, (one kind of relaxation method) taking into consideration boththe similarity and continuity, and the like can be used instead.

At step S113 parallel to step S112, the shot object image is dividedinto a plurality of regions R as in the above sixth embodiment. Thedepth information computation of step S12 and the region segmentation ofstep S13 do not have to be carried out at the same time. Computation ofthe depth information can be followed by the region segmentation, orvice versa.

FIG. 27A shows the object image divided into a plurality of regions R.FIG. 27B shows an image with the depth information represented by theluminance of the pixel. A pixel of a higher luminance indicates that thedistance from the shooting portion is closer whereas a pixel of lowerluminance indicates that the distance from the shooting position is moredistant. Therefore, the object portion is bright and the backgroundportion is dark.

At step S114, the mean value mdp (R) of the depth information iscomputed for each region R according to the following equation (6).$\begin{matrix}{{{{{mdp}(R)} = \frac{\sum\limits_{R}{{dp}\left( {i,j} \right)}}{n}}\begin{matrix}{\sum{R:}} & {{total}\quad {sum}\quad {in}\quad {region}\quad R} \\{n:} & {{number}\quad {of}\quad {pixels}\quad {in}\quad {region}\quad R}\end{matrix}}\quad} & (6)\end{matrix}$

At step S115, the computed mdp (R) of the depth information is comparedwith a threshold value dpth. This threshold value dpth is determined inadvance empirically.

When mean value mdp (R) of depth information is greater than thresholdvalue dpth, the value v (i, j) of all the pixels within that region R isset to 0. In other words, that region R is removed from the object imageas the background portion. When mean value mdp (R) of the depthinformation is smaller than threshold value dpth, the value v (i, j) ofall the pixels in that region R is set to 1 at step S117. In otherwords, that region R is extracted as the object portion from the objectimage.

At step S118, determination is made whether the process of stepsS115-S117 has been carried out for all the regions R. When the aboveprocess has been carried out from all the regions R, an object as shownin FIG. 27C is obtained.

According to the seventh embodiment, the mean value of the depthinformation is computed for every region R of the object image, and aregion having a mean value smaller than the predetermined thresholdvalue is extracted as the object portion. Therefore, by removing onlythe background portion from the object image, the object portion can beproperly cut out with the contour thereof as shown in FIG. 27C.Furthermore, it is not necessary to shoot only the background of theobject of interest as an additional step since the depth information isused.

Eight Embodiment

FIG. 28 is a flow chart showing the main components of an objectextraction apparatus according to an eighth embodiment of the presentinvention. In FIG. 28, steps S222, S224-S227 are stored in CD-ROM 260 asa program of removing the background portion from the object image toextract the object portion according to an object image obtained byshooting an object of interest and a plurality of background imagesobtained by shooting only the background of the object of interest aplurality of times.

This program includes a step S222 of computing for every pixel the meanvalue m (i, j) and the standard deviation σ (i, j) of pixels located atthe same coordinates in the plurality of background images obtained atstep S221, a step S224 of computing an absolute value |v (i, j)−m (i,j)| (simply referred to as “difference” hereinafter) of the differencebetween value v (i, j) of each pixel in the object image obtained atstep S223 and the mean value m (i, j) of the pixels in the backgroundimages corresponding to that pixel, and comparing that difference |v (i,j)−m (i, j) with k times the standard deviation σ (i, j), a step S225 ofsetting, when difference |v (i, j)−m (i, j)| is greater than kσ (i, j),value (i, j) of that pixel to 0 to remove that pixel as the backgroundportion, a step S226 of, when difference |v (i, j)| m (i, j)| is greaterthan kσ (i, j), extracting that pixel as an object portion, i.e. settingvalue v (i, j) of that pixel to 1, and step S227 of determining whetherthe process of step S224-S226 has been carried out for all the pixels.

The operation of the object extraction apparatus of the eighthembodiment will be described with reference to FIG. 28.

At step S221, only the background of an object of the interest is shotfor a plurality of times using a digital still camera from the sameviewpoint to obtain a plurality of background images. Taking intoconsideration the accuracy, the number of background images to beobtained is preferably at least 3. Taking into consideration thesimplicity, this number of background images is preferably ten.

At step S222, the mean value m (i, j) and the standard deviation σ (i,j) of the pixels located at the same coordinate in the plurality ofbackground images are computed for each pixel according to the followingequations (7) and (8). Even in the case where an abnormal value isobtained as the pixel value of the background image due to variation inthe conversion characteristics of the A/D converter of A/D convertingthe image signal, variation in the illumination characteristic, jitter,and the like, a stable background image can be obtained since theaverage of the pixel values is computed. $\begin{matrix}{{m\left( {i,j} \right)} = \frac{\sum{v\left( {i,j} \right)}}{N}} & (7) \\{{\sigma \left( {i,j} \right)} = \sqrt{\frac{\sum{v\left( {i,j} \right)}^{2}}{N} - \left( \frac{\sum{v\left( {i,j} \right)}}{N} \right)^{2}}} & (8)\end{matrix}$

Here, N is the number of pixels in all the regions R of the objectimage.

At step S223, an object of image is shot to obtain an object image.Here, v (i, j) is obtained as the value of each pixel of the objectimage.

At step S224, the difference |v (i, j)−m (i, j)| between value v (i, j)of each pixel in the object image and mean value m (i, j) of pixels ofthe background images corresponding to that pixel is computed.

When the difference |v (i, j)−m (i, j)| is smaller than kσ (i, j), valuev (i, j) of that pixel is set to 0 at step S225. As a result, that pixelis removed from the object image as the background portion. Whendifference |v (i, j)−m (i, j)| is greater than kσ (i, j), value v (i, j)of the pixel is set to 1 at step S226. As a result, that pixel isextracted as the object portion from the object image. Here, k ispreferably approximately 3.

At step S227, determination is made whether the process of stepsS224-S226 has been carried out for all the pixels. When the aboveprocess has been carried out for all the pixels, this program ends.

According to the above eighth embodiment, the mean value of pixels iscomputed according to the plurality of background images. Therefore, theeffect of the conversion characteristic of the A/D converter for A/Dconverting the image signal and the illumination characteristics can bealleviated. Furthermore, since the standard deviation of the pixels in aplurality of background images is used as the threshold value todetermine between an object image and a background image, an appropriatethreshold value can be set automatically. Thus, the object portion canbe properly extracted by removing only the background portion from theobject image.

Ninth Embodiment

FIG. 29 is a flow chart showing the main components of an objectextraction apparatus according to a ninth embodiment of the presentinvention. In FIG. 29, steps S222, S333B-S336, and S227 are a program ofhaving computer 130 remove the background portion from the object imageto extract the object portion according to an object image obtained byshooting an object of interest and a plurality of background imagesobtained by shooting only the background of the object of interest for aplurality of times. This program is stored in CD-ROM 260.

Although the object image is shot only once to obtain one object imageat step S223 in the previous eighth embodiment, the object of image isshot a plurality of times at step S333A of the ninth embodiment toobtain a plurality of object images. Therefore, a step S333B is providedto compute mean value mv (i, j) of the pixels located at the samecoordinate at the plurality of object images for each pixel. In stepsS334-S336, the mean value mv (i, j) of the pixels is used instead ofvalue v (i, j) of the pixel shown in FIG. 28. Therefore, mean value mv(i, j) of the pixels located at the same coordinate in the plurality ofobject images obtained at step S333A is computed for each pixel.

At step S334, the difference |mv (i, j)−m (i, j)| between mean value mv(i, j) of each pixel in the object image and the mean value m (i, j) ofthe pixels in the background image corresponding to that pixel iscomputed. That difference |mv (i, j)−m (i, j)| is compared with Kσ (i,j).

When difference |mv (i, j)−m (i, j)| is smaller than kσ (i, j), meanvalue mv (i, j) of that pixel in the object image is set to 0 at stepS335. As a result, that pixel is removed as the background portion. Whendifference |mv (i, j)−m (i, j)| is greater than kσ (i, j), mean value mv(i, j) of the pixel of the object image is set to 1 at step S336. As aresult, that pixel is extracted as the object portion from the objectimage.

According to the above ninth embodiment, a plurality of object imagesobtained by shooting the target object for a plurality of times is used.Therefore, a robust object image can be obtained similar to that of thebackground image. Thus, the object portion is extracted more accuratelywith the background portion removed from the object image.

Tenth Embodiment

FIG. 30 is a flow chart showing the main components of an objectextraction apparatus according to a tenth embodiment of the presentinvention. In FIG. 30, steps S222, S441-S447 are a program to havecomputer 130 remove the background portion from the object image toextract an object portion according to an object image obtained byshooting an object of interest and a plurality of background imagesobtained by shooting only the background of the object of the interestfor a plurality of times. The program is stored in CD-ROM 260.

In contrast to the fifth embodiment of FIG. 28 where the object image isprocessed for each pixel, the object image of the present tenthembodiment is divided into a plurality of regions R, which are processedindividually.

The program includes a step S441 dividing the object image obtained atstep S223 into a plurality of regions R, a step S442 computing thedifference between value v (i, j) of each pixel in each region R of theobject image and mean value m (i, j) of the corresponding pixels inregion R of the background image corresponding to that region R, andcomputing the mean value md (R) of the difference represented by thefollowing equation (9) for each region R, and a step S443 computing foreach region R the mean value mσ (R) of the standard deviation computedat step S223 according to the following equation (10). $\begin{matrix}{{{md}(R)} = \frac{\sum\limits_{R}{{{v\left( {i,j} \right)} - {m\left( {i,j} \right)}}}}{n}} & (9) \\{{m\quad {\sigma (R)}} = \frac{\sum\limits_{R}{\sigma \left( {i,j} \right)}}{n}} & (10)\end{matrix}$

At steps S444-S446, the mean value md (R) of the difference is usedinstead of difference |v (i, j)−m (i, j)| of FIG. 28. Also, mean value a(R) of the standard deviation is used instead of standard deviation σ(i, j) The object image obtained at step S223 is divided into aplurality of regions R at step S441.

At step S442, the difference |v (i, j)−m (i, j)| between value v (i, j)of each pixel in each region of the object image and mean value m (i, j)of corresponding pixels in region R of the background imagescorresponding to that region R is computed. A mean value md (R) ofdifference is computed for each region R.

At step S443, mean value mσ (R) of the standard deviation σ (i, j)obtained at step S222 is computed for each region R.

At step S444, the difference mean value md (R) is compared with kmσ (R).When the difference mean value md (R) is smaller than kmσ (R), value v(i, j) of all the pixels in that region R is set to 0 at step S445. As aresult, region R is removed from the object image as the backgroundportion. When difference mean value md (R) is greater than kmσ (R),value v (i, j) of the pixels in that region R are all set to 1 at stepS446. As a result, that region R is extracted as an object portion fromthat object image.

At step S447, determination is made whether the process of stepsS444-S446 is carried out for all regions R. When the above process hasbeen carried out for all the regions R, the program ends.

According to the above tenth embodiment, the object of interest isdivided into a plurality of regions R, the mean value md (R) of thedifference between the value of each pixel in each region R of theobject image and the mean value of the corresponding pixels in region Rof the background image corresponding to that region R is computed foreach region R, and a region having the difference mean value md (R)greater than k times the mean value mσ (R) of the standard deviation.Therefore, the object portion can be extracted more correctly with thebackground portion removed from the object image.

Although it is preferable to compute the difference between value v (i,j) of each pixel in each region R of the object image and mean value m(i, j) of the corresponding pixels in region R of the background imagecorresponding to that region R at step S442, it is also possible tocompute mean value mv (i, j) of the pixels in each region of the objectimage and then compute the absolute value of the difference between themean value of the pixels in each region R of the object image and themean value m (i, j) of the pixels in region R of the background imagecorresponding to that region R. In this case, value v (i, j) of eachpixel in each region R of the object image is replaced with mean valuemv (i, j) of the pixels in each region R of the object image in the flowchart of FIG. 30.

Alternatively, mean value mv (R) of the pixels in each region R of theobject pixel is computed, and mean value mm (R) in region R of meanvalues m (i, j) for each pixel in region R of the background imagecorresponding to region R to obtain an absolute value of the differencethereof. An object portion can be extracted on the basis of this value.In this case, |mv (R)−mm (R)| is computed as md (R) in obtaining md (R)at step S442.

Eleventh Embodiment

FIG. 31 is a flow chart showing main components of an object extractionapparatus according to an eleventh embodiment of the present invention.In contrast to the previous tenth embodiment where the object ofinterest is shot one time to obtain one object image at step S223, theobject of interest is shot for a plurality of times from the same viewpoint at step S333A similar to the ninth embodiment to obtain aplurality of object images in the present eleventh embodiment.Therefore, an object image that is the average of the plurality ofobject images is segmented into a plurality of regions R at step S551.Therefore, in steps S555 and S556, mean value mv (i, j) of the pixels isused instead of value v (i, j) of the pixel.

According to the present eleventh embodiment, a plurality of objectimages is obtained by shooting an object of interest for a plurality oftimes from the same viewpoint. Therefore, variation in the conversioncharacteristics of the A/D converter and illumination characteristic atthe time of shooting of an object of interest is alleviated. An objectportion can be extracted more properly by removing the backgroundportion from the object image.

Twelfth Embodiment

A three-dimensional model generation apparatus according to a twelfthembodiment of the present invention includes, similar to the firstembodiment of FIG. 3, includes a turntable 110, a camera 120, and acomputer 130. Here, a robot arm and the like can be used instead ofturntable 110. In other words, a component that can alter the directionof the object of interest can be used instead of turntable 110.

FIG. 32 is a block diagram schematically showing this three-dimensionalmodel generation apparatus. Referring to FIG. 32, the three-dimensionalmodel generation apparatus includes a pickup unit 109, an image storageunit 220, an arithmetic logic/control unit 113, a shape storage unit230, and a color information storage unit 240. Pickup unit 109 includesturntable 110 and camera 120 of FIG. 3. Image storage unit 220,arithmetic logic/control unit 113, shape storage unit 230, and colorinformation storage unit 240 are included in computer 130 of FIG. 3.

FIG. 33 is a diagram for describing the flow of the process of thethree-dimensional model generation apparatus of FIG. 3. FIGS. 6A-6E arediagrams for describing the specific process of three-dimensional modelgeneration apparatus of FIG. 3. FIG. 6A corresponds to the shootingoperation of the object of interest and background of step S2 in FIG.33. FIG. 6B corresponds to generation of a silhouette image at step S12of FIG. 33. FIG. 6C corresponds to the voting process at step S14 ofFIG. 33. FIG. 6D corresponds to generation of a polygon at step S16 ofFIG. 33. FIG. 6E is a diagram to describe texture mapping at step S18 ofFIG. 33.

Description will be provided hereinafter with reference to FIGS. 3,6A-6E, 32 and 33. At step S8, calibration is carried out. Calibration inthe twelfth embodiment refers to the process of obtaining the internalparameter (respective ratio) of camera 120, and the positionrelationship between camera 120 and turntable 110. At step S10, anobject of interest and the background are shot. Only the background isshot without the object of interest placed on turntable 110 to obtainone background image. Also, target object 100 is placed on turntable 110to be rotated. Target object 100 is shot together with the background atevery predetermined angle by camera 120 to result in object imagesA1-An. For example, target object 100 is rotated for every 10° to obtain36 object images A1-A36. The following description is providedcorresponding to the case of obtaining a three-dimensional model 29 onthe basis of 36 obtained object images A1-A36. Here, the position andangle of depression (or angle of elevation) is fixed. Camera 120 andturntable 110 are under control of arithmetic logic/control unit 113.The background image and object image obtained at step Sb are stored inimage storage unit 220. At the twelfth embodiment, shooting is effectedwith the camera fixed and the object of interest rotated. In order toreduce the shooting times of the background, the background is shot onlyonce to obtain one background image. However, to obtain a backgroundimage of higher reliability, the background can be shot two or moretimes to obtain two or more background images.

In the case where target object 100 is shot from a plurality ofdirections about target object 100 including the background with camera120 fixed and target object 100 rotated, shooting of the background isrequired only once. However, when target object 100 is shot includingthe background from a plurality of directions about target object 100with target object 100 fixed and camera 120 moved about target object100, shooting of the background must be carried out a plurality oftimes.

At step S12, a silhouette generation unit not shown provides asilhouette image. More specifically, a difference process is carried outbetween each of object images A1-A36 and the background image to resultin a plurality of silhouette images B1-Bn. Since there are 36 objectimages A1-A36, 36 silhouette images are obtained. Here, the differenceprocess (the process obtaining the difference) refers to obtaining thedifference between the color information of the object image and thecolor information of the background image for each pixel. At step S12, avoting unit not shown carries out the voting process. On the basis ofthe plurality of silhouette images B1-B36, a voting process on thecylindrical coordinate system voting space 251 is carried out. Athreshold processing unit (three-dimensional shape acquirement unit) notshown sets the portion with the score of the votes exceeding a thresholdvalue as the three-dimensional shape (existing region) of target object100.

Although the orthogonal coordinate system voxel space can be used as thevoxel space, it is preferable to use the cylindrical coordinate systemvoxel space 251. This is because the memory capacity can be suppressedwhile favorable acquirement of the shape can be effected.

At step S16, a plurality of three-dimensional shape constituent elements(for example, a polygon such as a triangular patch; for the sake ofsimplification, the three-dimensional shape constituent element isrepresented as a polygon hereinafter) 27 on the basis of thethree-dimensional shape of target object 100 obtained at step S14. Thethree-dimensional shape of target object 100 obtained at step S14 isrepresented by a plurality of polygons 27. The three-dimensional shaperepresented by polygons 27 is stored in shape storage unit 230. At stepS18, the texture corresponding to each polygon 27 generated at step S16is obtained from the object image to be mapped on each polygon 27. Thetexture (color information) is stored in color information storage unit240. The process of steps S12-S18 is carried out by arithmeticlogic/control unit 113. The silhouette generation unit, the voting unit,and the threshold processing unit are included in arithmeticlogic/control unit 113. Details of the calibration of step S18, thevoting process of step S14, and polygon generation of step S12, and stepS16 are set forth in the following.

Calibration

As the calibration, the internal parameter (perspective ratio) of camera120, and the position relationship between camera 120 and turntable 110are obtained. First, the internal parameter (perspective ratio) ofcamera 120 will be described. FIG. 34 is a diagram to describe theinternal parameter (perspective ratio) of camera 120. Referring to FIG.34, a reference block 31 is shot by camera 120. Here, shooting iseffected so that reference block 31 exactly fits a screen 33. Thedistance L between camera 120 and reference block 31 is measured here.Also, the height T of reference block 31 is measured. The perspectiveratio is the height T of reference block 31 divided by distance L. Inother words, the perspective ratio is represented as T/L. In theperspective representation, the size of an object projected on a screenis enlarged/shrunk according to the distance from the view point to theobject. The parameter determining that ratio of enlargement/shrinkage isthe perspective ratio.

Measurement of the position relationship between camera 120 andturntable 110 is described hereinafter. FIGS. 35A-35C are diagrams todescribe the measurement of the position relationship between a cameraand a turntable. FIG. 35A shows a camera 120 placed at the coordinatesystem (xyz coordinate system) of turntable 110. Referring to FIG. 35A,the position (x 0 , y 0 , z 0 ) of camera 120 is obtained using thecoordinate system (xyz coordinate system) of turntable 110. Also, theangle of rotation α about the optical axis 35 of camera 120 is obtained.FIG. 35B shows the orthogonal projection on plane yz of camera 120 ofFIG. 35A. Referring to FIG. 35B, the angle D between optical axis 35 ofcamera 120 and the y axis is obtained. FIG. 35C shows the orthogonalprojection on plane xy of camera 120 of FIG. 35A. Referring to FIG. 35C,the angle γ between optical axis 35 and the y axis of camera 120 isobtained.

More specifically, the position of camera 120 based on the coordinatesystem (xyz coordinate system) of turntable 110 and angles α, β, and γare obtained as the position relationship between camera 120 and theturntable 110. In the present twelfth embodiment, angles α and β are setto approximately 0°. Here, angle β is the angle of depression of camera120 with respect to turntable 110. This angle β is also referred to asthe angle of depression of camera 120 with respect to an object ofinterest placed on turntable 110. Here, the angle of depression includesa negative angle of depression, i.e. angle of elevation.

Since the angle of depression of the camera with respect to a targetobject is obtained as the calibration in the present twelfth embodiment,a three-dimensional model 29 can be generated on the basis of an objectimage obtained by shooting a target object with this angle ofdepression. In other words, a three-dimensional model 29 is generated,not only based on the object image obtained by shooting an object ofinterest from the horizontal direction (a direction parallel to the xyplane), but also on the basis of an object image obtained by shootingthe target object from an above oblique direction. Therefore, sufficientcolor information can be obtained, including the upper portion of thetarget object that could not be obtained from only an object shot fromthe horizontal direction. Stereoscopic model 29 can be generated of highaccuracy since a local concave portion of the object of interest can berecognized.

Voting Process

Details of a voting process at step S14 of FIG. 33 will be described.FIG. 36 is a diagram for describing a cylindrical coordinate systemvoxel space 251 for the voting process. Referring to FIG. 36,cylindrical coordinate system voxel space 251 includes a plurality ofvoxels 39. For the sake of describing a voxel in cylindrical coordinatesystem voxel space 251, cylindrical coordinate system voxel space 251 ofFIG. 36 is considered as a circular cylinder with a center axis 40. Thiscircular cylinder 25 is cut at a plurality of planes perpendicular tocenter axis 40. Also, circular cylinder 25 is cut at a plurality ofplanes including and in parallel to center axis 40. Furthermore,circular cylinder 25 is cut at a plurality of rotary planes centeredabout center axis 40. Each element of circular cylinder 25 obtained bycutting circular cylinder 25 corresponds to each voxel 39 in cylindricalcoordinate system voxel space 251.

FIG. 37 is a diagram to describe the voting process. A voting process iscarried out on cylindrical coordinate system voxel space 251 on thebasis of 36 silhouette images B1-B36 obtained at step S12 of FIG. 33. InFIG. 37, only two silhouette images B1 and B2 are shown.

Attention is focused on hypothetical existing region 50. FIG. 7 is adiagram to describe a hypothetical existing region. In FIG. 7, only onesilhouette image B1 is shown. Referring to FIGS. 37 and 7, ahypothetical existing region 50 is a conical region with projectioncenter 51 of the camera as the vertex and object image 42 (contour oftarget object 100) of silhouette image B1 as a cross sectional shapewith respect to silhouette image B1. A hypothetical existing region canbe defined similarly for the other silhouette images B2-B36. Targetobject 100 inevitably resides within this hypothetical existing region.

Referring to FIG. 36, the vote point of “1” is applied to all voxels 39in hypothetical existing region 50 in the voting process. This votingprocess is carried out for all silhouette images B1-B36. For example,voxel 39 that resides at the overlapping portion of all the hypotheticalexisting regions corresponding to the 36 silhouette images B1-B36 hasthe vote score of “36”.

At step S10 of FIG. 33, an object of interest is shot for every 10° toobtain 36 object images. 36 silhouette images B1-B36 are generated atstep S12. Therefore, the vertex of the hypothetical existing region(corresponding to the projection center of camera) is located for every10° around center axis 40. The position of the vertex (corresponding tothe projection center of camera) of the hypothetical existing region isdetermined according to the calibration result of step S8 of FIG. 33. Inother words, the position relationship between silhouette images B1-B36and the vertex (corresponding to projection center of camera) of acorresponding hypothetical existing region is determined by theperspective ratio. More specifically, the breadth angle of the conewhich is the hypothetical existing region is determined. By the positionrelationship between camera 120 and turntable 110, the positionrelationship between the vertex (corresponding to projection center ofcamera) of the hypothetical existing region corresponding to silhouetteimages B1-B36 and cylindrical coordinate system voxel space 251 isdetermined.

FIG. 38 shows the result of the voting process. Referring to FIG. 38,the dark color portion has the value of a high vote score whereas thelight color portion has the value of a low vote score. The z axis ofFIG. 38 corresponds to central axis 40 of FIG. 37.

Following the voting process on all silhouette images B1-B36, athreshold process is carried out. More specifically, the region of avoxel 39 having a vote score equal to or higher than a predeterminedthreshold value is set as the existing region of target object 100. Thisshape of the existing region is the three-dimensional shape of targetobject 100. If the threshold value is “32” for example, the shape of aregion where voxel 39 has a vote score of at least “32” corresponds tothe three-dimensional shape of target object 100.

In the twelfth embodiment, the three-dimensional shape of an object ofinterest is obtained by the voting process. Therefore athree-dimensional model 29 of high accuracy can be generated even ifthere are some improper images in the plurality of silhouette imagesused in the voting process. Conventionally, a three-dimensional shape isobtained by the logical AND operation of a plurality of hypotheticalexisting regions. When an object image in the silhouette image is notcorrect and the contour of the object of interest is not representedproperly so that there is a partial missing portion in the shape of theobject of interest, that missing portion could not be represented in athree-dimensional shape of the object of interest. Here, the existingregion of an object of interest in voxel space 251 is estimated by thevoting process. If the existing probability of the object of interest invoxel space 251 can be obtained, the existing region of the object ofinterest can be estimated by a process other than the voting process.

Polygon Generation

FIGS. 39A and 39B are diagrams for specifically describing polygongeneration at step S16 of FIG. 33. FIG. 40 is a diagram for describingthe flow of the polygon generation step of S16 of FIG. 33. FIG. 39Bshows a polygon obtained on the basis of contour lines 43A and 43Bresiding in a portion 39B in FIG. 39A. At step SA1 referring to FIGS.39A and 40, the three-dimensional shape (refer to FIG. 38) of targetobject 100 obtained according to the result of the voting process is cutby a cross section not shown at a plurality of planes (in FIG. 39A, onlythree planes 41 a, 41 b and 41 c are shown) to obtain the contour line(in FIG. 39A, only three contour lines 43 a, 43 b and 43 c are shown) ofeach cut up plane (in FIG. 39A, only three cut planes 440 a, 44 b and 44c are shown). At step S10 of FIG. 33, an object of interest is shot forevery 100 to obtain an object image, and at step S12, silhouette imagesB1-B36 for every 10° are generated. Therefore, the three-dimensionalshape of target object 100 is cut up at a plurality of planes for every100 about center axis 40. In other words, the three-dimensional shape oftarget object 100 is cut up with a plurality of planes so that adjacentplanes are at an angle 0 of 100. Each plane that cuts up thethree-dimensional shape of target object 100 is a plane including centeraxis 40.

At step SA2, a polygonal approximation unit not shown approximates eachcontour line of each cut up plane with a polygon to obtain thecoordinate of the vertex of that polygon. As this polygonalapproximation method, the method disclosed in, for example, “AnIterative Procedure for the Polygonal Approximation of Plane Curves”,CGIP, Vol. 1, pp. 244-256, 1972 by U. Ramer, can be employed. Then, aconnection unit not shown connects adjacent vertices of each cut upplane with a straight line. At step SA3, vertices corresponding to thecontour line of each cut up plane are connected between adjacent cut upplanes to generate a polygon. In the polygonal approximation of stepSA2, the number of polygons that are eventually generated can becontrolled by setting the approximation precision variable.

The process of steps SA2 and SA3 will be described with reference toFIG. 39B. At step SA2, contour lines 43 a and 43 b are approximated witha polygon, and the coordinates of vertices 45 a and 45 b of the polygonare obtained. As to the plurality of vertices 45 a obtained by thepolygonal approximation of contour line 43 a, adjacent vertices 45 a areconnected with a straight line. A similar process is carried out for aplurality of vertices 45 b obtained by the polygonal approximation ofcontour line 43 b. Here, vertex 45 a corresponds to contour line 43 aand vertex 45 b corresponds to contour line 43 b. At step S3, vertex 45a corresponding to contour line 43 a of cut plane 44 a and vertex 45 bcorresponding to contour line 43 b of cut plane 44 b are connected witha straight line to generate polygon 27. The local most proximity pointconnection strategy and global shortest connection strategy are known asmethods to connect vertices 45 a and 45 b with a straight line.

According to the local most proximity point connection strategy, thevertices that are most closest to each other of the vertex obtained bypolygonal approximation of one contour line of an adjacent cut plane andthe vertex obtained by polygonal approximation of the other contour linein the adjacent cut plane is connected with a straight line. Accordingto the global shortest connection strategy, the vertex obtained bypolygonal approximation of one contour line of an adjacent cut plane andthe vertex obtained by polygonal approximation of the other contour lineof the adjacent cut plane are connected with a straight line so that thesum of the length between vertices become minimum.

Details of the local most proximity point connection strategy will beprovided. FIG. 41 shows the relationship of vertices corresponding tothe contour lines of adjacent cut planes. Here, a cut plane Scnt and acut plane Scnt+1 are taken as examples of adjacent cut planes. Referringto FIG. 41, vertices a, b, c, d, e and f are obtained by polygonalapproximation of the contour lines of cut plane Scnt. Vertices A, B, C,D, E, F and G are obtained by polygonal approximation of the contourlines of cut plane Scnt+1. Since it is premised that the polygon isgenerated using cylindrical coordinate system voxel space 251, vertex aand vertex A are the same points, and vertex f and vertex G are the samepoints.

FIG. 42 is a diagram for describing the local most proximity pointconnection strategy. Referring to FIG. 42, the horizontal directioncorresponds to vertices a-f of cut plane Scnt whereas the verticaldirection corresponds to vertices A-G of cut plane Scnt+1. The number ateach lattice point (the number in the circle) represents the distancebetween vertices a-f (FIG. 41) corresponding to the contour line of cutplane Scnt and vertices A-G (FIG. 41) corresponding to the contour linesof cut plane Scnt+1. For example, at the crossing between d and D (thelattice point determined by d and D), the distance between vertex d andvertex D of FIG. 41 is indicated. More specifically, the distancebetween vertices d and D of FIG. 41 is “2”.

Referring to FIGS. 41 and 42, first an initial polygon is generatedaccording to the local most proximity point connection strategy. Thefollowing two methods are known as this initial polygon generationmethod. The first method connects vertices b and B with a straight lineunconditionally. In the second method of initial polygon generation,those with the shortest distance among the distances between vertices band B, a and C, and A and c is selected, and vertices thereof areconnected with a straight line. In the examples of FIGS. 41 and 42,vertices b and B are selected and a straight line is connectedtherebetween in both the above two initial polygon generation methods.

Connection between vertices c and B or vertices b and C is considered.Since the distance between vertices b and C is shorter than the distancebetween vertices c and B, vertices b and C are connected with thestraight line. Then, connection between vertices c and C or vertices band D is considered. Since the distance between vertices b and D and thedistance between vertices c and C are equal, either can be connected.Here, vertices b and D are connected with a straight line. Then,connection between vertices c and D or vertices b and E is considered.Since the distance between vertices c and D is shorter than the distancebetween vertices b and E, vertices c and D is connected with a straightline. By repeating this process, the vertex corresponding to the contourline of cut plane Scnt and the vertex corresponding to the contour lineof cut plane Scnt+1 are connected with the straight line. Morespecifically, at each lattice point of FIG. 42, the distance between thevertices corresponding to the lattice point located at the right iscompared with the distance of the vertices corresponding to the latticepoint located below. The vertices corresponding to a lattice point wherethe shortest distance is indicated is connected with a straight line.FIG. 43 shows polygons obtained by connecting vertices a-f and verticesA-G of FIG. 41 by the local most proximity point connection strategy.Components similar to those of FIG. 41 have the same referencecharacters allotted, and their description will not be repeated.Referring to FIG. 43, vertices a-f and vertices A-G are connectedaccording to the local most proximity point connection strategy to formpolygons (triangular patch) 27.

FIG. 44 is a diagram for describing a part of the polygon generationflow according to the local most proximity point connection strategy.FIG. 45 is a diagram for describing the remaining part of the polygongeneration flow according to the local most proximity point connectionstrategy. Here, the method of connecting the first vertices togetherunconditionally (the first method of initial polygon generation) isemployed. At step SB1 of FIG. 44, the number of cut planes obtained atstep SA1 of FIG. 40 is inserted into variable Smax. Also, “0” isinserted into variable Scnt. At step SB2, the vertex number at theScnt-th cut plane is inserted into variable Vmax. “0” is inserted intovariable Vcnt. At step SB3, the Vcnt-th vertex at the Scnt-th cut planeis connected with the (Vcnt+1)th vertex at the Scnt-th cut plane. As forvertices a-f and A-G of FIG. 41, vertices a and A are the 0-th vertices,vertices b and B are the first vertices, and vertices c and C are thethird vertices. At step SB4, Vcnt+1 is inserted into variable Vcnt. Whenvariable Vcnt is equal or greater than Vmax−1 at step SB5, controlproceeds to step SB6. When variable Vcnt is smaller than Vmax−1 atstep-SB5, control proceeds to step SB3. At step SB6, Scnt+1 is insertedinto variable Scnt. When variable Scnt is at least Smax at step SB7,control proceeds to step SB8 of FIG. 45. When variable Scnt is smallerthan Smax at step SB7, control proceeds to step SB2.

At step SB8 of FIG., 45, “0” is inserted into variable Scnt. At stepSB9, the vertex number at the Scnt-th cut plane is inserted intovariable imax. Then, the vertex number at the (Scnt+1)th cut plane isinserted into variable jmax. At step SB10, the initial polygon isgenerated. Here, the method of connecting the first vertices with eachother unconditionally is employed (first method of initial polygongeneration). The first vertex of the Scnt-th cut plane is connected withthe first vertex of the (Scnt+1)th cut plane. Then, “1” is inserted intovariable i. “1” is inserted into variable j. At step SB11, i+1 isinserted into variable i_n, and j+1 is inserted into variable j_n. Atstep SB12, dist ([Scnt: i], [Scnt +1: ij_n]) implies the distancebetween the i-th vertex of the Scnt-th cut plane and the (j_n)th vertexof the (Snct+1)th cut plane. Also, dist ([Scnt: i_n], [Scnt +1: j])implies the distance between the (i_n)th vertex of the Scnt-th cut planeand the j-th vertex of the (Scnt+1)th cut plane. When the distancebetween the i-th vertex of the Scnt-th cut plane and the (j_n)th vertexof the (Scnt+1)th cut plane is equal to or less than the distancebetween the (i_n)th vertex of the Scnt-th cut plane and the j-th vertexof the (Scnt+1)th cut plane at step SB12, control proceeds to step SB13.Otherwise, to step SB14.

At step SB13, the i-th vertex of the Scnt-th cut plane is connected withthe (j_n)th vertex of the (Scnt+1)th cut plane. Then, j_n is insertedinto variable j. At step SB14, the (i_n)th vertex of the Scnt-th cutplane is connected with the j-th vertex of the (Scnt+1)th cut plane.Then, i_n is inserted into variable i. When variable i is equal to orgreater than imax−1 at step SB15, control proceeds to step SB17. Whenvariable i is smaller than imax−1, control proceeds to step SB16. Atstep SB17, the i-th vertex of the Scnt-th cut plane is connected witheach of (j˜j max−1)th vertices of the (Scnt+1)th cut plane. Whenvariable j is equal or greater than jmax−1 at step SB16, controlproceeds to step SB18. When variable j is smaller than jmax−1, controlproceeds to step SB11. At step SB18, the j-th vertex of the (Scnt+1)thcut plane is connected with each of the (i˜imax−1)th vertices of theScnt-th cut plane. At step SB19, Scnt+1 is inserted into variable Scnt.When variable Scnt is smaller than Smax at step SB20, control proceedsto step SB9. When variable Scnt is equal to or greater than Smax, theprocess ends. Here, there are cut planes from 0 to Smax−1. There is thecase where the vertex of the Smax-th cut plane must be considered in theevent that Scnt is Smax−1 of FIG. 45. In this case, the Smax-th cutplane is assumed to be identical to the 0-th cut plane.

Polygon generation according to the global shortest connection strategywill be described in detail hereinafter with reference to FIG. 42. Apath with lattices point aA (crossing point between a and A) as thestarting point and lattice point fG (crossing point between f and G) asthe end point will be considered. Every time a lattice point is crossed,the value of the distance assigned to the passed lattice point is addedas the penalty. The path with the smallest penalty score is obtained. Inother words, the shortest path out of the plurality of paths fromlattice point Aa to lattice point fG is obtained. Such a shortest pathis obtained using the round robin method, branch-and-bound method,Dijkstra algorithm, A* algorithm and the like. In FIG. 42, the pathindicated by the bold solid line is the shortest penalty path (shortestpath). The vertices (refer to FIG. 41) corresponding to the latticepoints located on the shortest penalty path (shortest path) areconnected. For example, shortest penalty path (bold solid line) passesthrough lattice points b and B. Therefore, vertices b and B of FIG. 41are connected. FIG. 43 shows polygons obtained by connecting with astraight line vertices a-f and vertices A-G of FIG. 41 by the globalshortest connection strategy.

FIG. 46 is a diagram to describe the polygon generation flow by theglobal shortest connection strategy. Steps similar to those of FIGS. 44and 45 have the same reference character allotted, and their descriptionwill be appropriately omitted. At step SC9 of FIG. 46, the vertex of theScnt-th cut plane and the vertex of the (Scnt+1)th cut plane areconnected so that the connected distance is shortest. At step SC10,Scnt+1 is inserted into variable Scnt. When variable Scnt is smallerthan Smax at step SC11, control proceeds to step SC9. When variable Scntis equal to or greater than Smax, the process ends.

According to the twelfth embodiment, a three-dimensional shape of anobject of interest is obtained using a cylindrical coordinate systemvoxel space 251. The three-dimensional shape is cut by a plurality ofplanes along the center axis of cylindrical coordinate system voxelspace 251. A shape model 300 is generated according to the contour lineof the cut plane. Therefore, the amount of data for generating a shapemodel is smaller than that for generating three-dimensional model 300using an orthogonal coordinate system voxel space. Therefore, high speedprocessing is allowed. A polygon that forms shape model 300 is generatedusing polygonal approximation and the local most proximity pointconnection strategy or the global shortest connection strategy.Therefore, the amount of data is smaller than that of the conventionalart that cuts the three-dimensional shape of an object of interest by aplurality of planes perpendicular to the axis of rotation to generateshape model 300. Therefore, the processing speed can be furtherimproved. In other words, shape model 300 can be generated in real time.Furthermore, a polygon 27 forming shape model 300 is generated usingpolygonal approximation and the local most proximity point connectionstrategy or the global shortest connection strategy. Therefore, theamount of data is reduced to allow faster processing.

The present twelfth embodiment has the following advantages in additionto the advantages described in the foregoing. The present twelfthembodiment can have the manual task reduced than the case where a shapemodel is generated using a three dimensional digitizer. Furthermore,measurement using laser is not carried out in the twelfth embodiment.Therefore, the material of the object of interest is not limited, andrestriction in the shooting environment such as the requirement ofmeasurement in a dark room can be reduced. According to the twelfthembodiment, three-dimensional model 29 can be generated with the simplestructure of turntable 110, camera 120, and computer 130. Furthermore,in the twelfth embodiment, a silhouette image is generated by differenceprocessing, which is used to produce shape model 300. Therefore, aspecial shooting environment such as a backboard of the same color isdispensable.

Although generation of a three-dimensional model 29 using one camera anda turntable to shoot an object of interest for every 10° has beendescribed, the number of cameras, the moving means of the shootingviewpoint, and the step of the shooting angle are not limited to thosedescribed above. An object of interest can be shot using a plurality ofcameras to generate three-dimensional model 29. This provides theadvantage that a shape of high accuracy can be acquired. As the movingmeans of the shooting viewpoint, a turntable under manual control or arobot arm can be used. The variation step of the shooting angle can beset more smaller in a complicated object, and the angle varied in astepped manner of the shooting angle can be varied depending upon thedirection. In other words, rotation at a finer step can be effected inthe direction of a complicated outer shape for shooting. When thevariation step of the shooting angle of the object of interest is to bemodified, modification is also carried out of the cut plane toreconstruct shape model 300 represented by polygons cut from voxel space251. The shooting angle and the cut plane are set in association.Accordingly, the contour information from the silhouette image obtainedby shooting can be reflected at high accuracy on the polygon data.

FIG. 47 shows a CD-ROM in which program 301 for having computer 130 ofFIG. 3 generate a three-dimensional model 29 of an object of interest isrecorded. Referring to FIG. 47, computer 130 generates athree-dimensional model 29 of an object of interest according to program301 recorded in CD-ROM 260. Program 301 recorded in CD-ROM 260 includesstep S12 generating a silhouette image of an object of interest, stepS14 of a voting process, step S16 of polygon generation, and step S18 oftexture mapping.

The present invention is not limited to the polygonal approximationtechnique of a cross section in reconstructing a shape model 300represented in polygons of a shape model of voxel representation. Forexample, a shape model represented by a meta-ball can be used instead ofshape model 300 represented in a wire frame. In the twelfth embodiment,a polygon is generated carrying out voting process on voxel space 251according to a silhouette image. A silhouette image can be transformedinto a polygon using the polygonal approximation method. In this case,correction by manual operation is required since a silhouette image isnot accurate.

What is claimed is:
 1. A three-dimensional model production apparatusfor producing a three-dimensional model of an object of interest,comprising: shooting means for shooting only a background of said objectof interest and for shooting said object of interest including abackground; silhouette production means for producing a plurality ofsilhouette images by obtaining difference between a background imageobtained by shooting only said background and a plurality of objectimages obtained by shooting said object of interest including saidbackground; and means for producing a three-dimensional model of saidobject of interest using said plurality of silhouette images.
 2. Thethree-dimensional model production apparatus according to claim 1,further comprising rotary means for rotating said object of interest. 3.A three-dimensional model production apparatus for producing athree-dimensional model for an object of interest, comprising:silhouette production means for producing a plurality of silhouetteimages of said object of interest; estimation means for estimating anexisting region of said object of interest in voxel space according tosaid plurality of silhouette images; means for producing 2-dimensionalmodel of said object of interest using said existing region of saidobject of interest obtained by said estimation means, wherein saidestimation means carries out a voting process on said voxel space; andthreshold value processing means for setting a portion having votingscore of at least a predetermined threshold value as said existingregion of said object of interest as a result of said voting process. 4.A three-dimensional model production method of producing athree-dimensional model of an object of interest, comprising the stepsof: shooting only a background of said object of interest by a pickupdevice to obtain a background image; shooting said object of interestincluding said background by said pickup device to obtain a plurality ofobject images; producing a plurality of silhouette images by obtainingdifference between said background image and said plurality of objectimages; and producing a three-dimensional model of said object ofinterest using said plurality of silhouette images.
 5. Thethree-dimensional model production method according to claim 4, furthercomprising the step of rotating said object of interest.
 6. Athree-dimensional model production method of producing athree-dimensional model of an object of interest, comprising the stepsof: producing a plurality of silhouette images of said object ofinterest; estimating an existing region of said object of interest ofvoxel space according to said plurality of silhouette images; producingsaid three-dimensional model using said estimated existing region ofsaid object of interest, wherein said step of estimating carries out avoting process on said voxel space; and setting a portion having avoting score of at least a predetermined threshold value of saidexisting region of said object of interest as a result of said votingprocess.
 7. A medium storing a program for causing a computer to producea three-dimensional model of an object of interest, said programcomprising the steps of: producing a plurality of silhouette images fromsaid object of interest; estimating an existing region of said object ofinterest in voxel space according to said plurality of silhouetteimages; producing said three-dimensional model using said estimatedexisting region of said object of interest, wherein said step ofestimating in said program carries out a voting process on said voxelspace; and wherein said program further comprises the step of setting aportion having a voting score of at least a predetermined thresholdvalue as said existing region of said object of interest as a result ofsaid voting process.